[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/518


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95522233
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -373,93 +369,26 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 return complexTypesMap;
   }
 
-  /**
-   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
-   *
-   * @param csvFilePath
-   * @return File
-   */
-  public static CarbonFile getCsvFileToRead(String csvFilePath) {
-CarbonFile csvFile =
-FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
-
-CarbonFile[] listFiles = null;
-if (csvFile.isDirectory()) {
-  listFiles = csvFile.listFiles(new CarbonFileFilter() {
-@Override public boolean accept(CarbonFile pathname) {
-  if (!pathname.isDirectory()) {
-if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
-
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
-+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
-  return true;
-}
-  }
-  return false;
-}
-  });
-} else {
-  listFiles = new CarbonFile[1];
-  listFiles[0] = csvFile;
-}
-return listFiles[0];
-  }
-
-  /**
-   * Get the file header from csv file.
-   */
-  public static String getFileHeader(CarbonFile csvFile)
-  throws DataLoadingException {
-DataInputStream fileReader = null;
-BufferedReader bufferedReader = null;
-String readLine = null;
-
-FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
-
-if (!csvFile.exists()) {
-  csvFile = FileFactory
-  .getCarbonFile(csvFile.getAbsolutePath() + 
CarbonCommonConstants.FILE_INPROGRESS_STATUS,
-  fileType);
-}
+  public static boolean isHeaderValid(String tableName, String[] csvHeader,
+  CarbonDataLoadSchema schema) {
+Iterator columnIterator =
+CarbonDataProcessorUtil.getSchemaColumnNames(schema, 
tableName).iterator();
+Set csvColumns = new HashSet(csvHeader.length);
+Collections.addAll(csvColumns, csvHeader);
 
-try {
-  fileReader = 
FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
-  bufferedReader =
-  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
-  readLine = bufferedReader.readLine();
-} catch (FileNotFoundException e) {
-  LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
-  throw new DataLoadingException("CSV Input File not found ", e);
-} catch (IOException e) {
-  LOGGER.error(e, "Not able to read CSV input File  " + 
e.getMessage());
-  throw new DataLoadingException("Not able to read CSV input File ", 
e);
-} finally {
-  CarbonUtil.closeStreams(fileReader, bufferedReader);
+while (columnIterator.hasNext()) {
--- End diff --

please add comment to describe this logic, column definition in schema 
should be subset of input CSV header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95521643
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -373,93 +368,25 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 return complexTypesMap;
   }
 
-  /**
-   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
-   *
-   * @param csvFilePath
-   * @return File
-   */
-  public static CarbonFile getCsvFileToRead(String csvFilePath) {
-CarbonFile csvFile =
-FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
-
-CarbonFile[] listFiles = null;
-if (csvFile.isDirectory()) {
-  listFiles = csvFile.listFiles(new CarbonFileFilter() {
-@Override public boolean accept(CarbonFile pathname) {
-  if (!pathname.isDirectory()) {
-if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
-
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
-+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
-  return true;
-}
-  }
-  return false;
-}
-  });
-} else {
-  listFiles = new CarbonFile[1];
-  listFiles[0] = csvFile;
-}
-return listFiles[0];
-  }
-
-  /**
-   * Get the file header from csv file.
-   */
-  public static String getFileHeader(CarbonFile csvFile)
-  throws DataLoadingException {
-DataInputStream fileReader = null;
-BufferedReader bufferedReader = null;
-String readLine = null;
-
-FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
-
-if (!csvFile.exists()) {
-  csvFile = FileFactory
-  .getCarbonFile(csvFile.getAbsolutePath() + 
CarbonCommonConstants.FILE_INPROGRESS_STATUS,
-  fileType);
-}
+  public static boolean isHeaderValid(String tableName, String[] csvHeader,
+  CarbonDataLoadSchema schema) {
+Iterator columnIterator =
+CarbonDataProcessorUtil.getSchemaColumnNames(schema, 
tableName).iterator();
+Set csvColumns = new HashSet(Arrays.asList(csvHeader));
--- End diff --

You can use `Collection.addAll` instead of converting to list and add


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518312
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -301,4 +304,45 @@ object CommonUtil {
   LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ 
newSplitSize.toString }")
 }
   }
+
+  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] 
= {
+val delimiter = if 
(StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
+  CarbonCommonConstants.COMMA
+} else {
+  CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter)
+}
+var csvFile: String = null
+var csvHeader: String = carbonLoadModel.getCsvHeader
+val csvColumns = if (StringUtils.isBlank(csvHeader)) {
+  // read header from csv file
+  csvFile = carbonLoadModel.getFactFilePath.split(",")(0)
+  csvHeader = CarbonUtil.readHeader(csvFile)
+  if (StringUtils.isBlank(csvHeader)) {
+throw new CarbonDataLoadingException("First line of the csv is not 
valid.")
+  }
+  csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", 
"").trim)
+} else {
+  csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim)
+}
+
+if 
(!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, 
csvColumns,
+carbonLoadModel.getCarbonDataLoadSchema)) {
+  if (csvFile == null) {
+LOGGER.error("CSV header provided in DDL is not proper."
+ + " Column names in schema and CSV header are not the 
same.")
+throw new CarbonDataLoadingException(
+  "CSV header provided in DDL is not proper. Column names in 
schema and CSV header are "
+  + "not the same.")
+  } else {
+LOGGER.error(
+  "CSV File provided is not proper. Column names in schema and csv 
header are not same. "
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518311
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -373,83 +368,15 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 return complexTypesMap;
   }
 
-  /**
-   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
-   *
-   * @param csvFilePath
-   * @return File
-   */
-  public static CarbonFile getCsvFileToRead(String csvFilePath) {
-CarbonFile csvFile =
-FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
-
-CarbonFile[] listFiles = null;
-if (csvFile.isDirectory()) {
-  listFiles = csvFile.listFiles(new CarbonFileFilter() {
-@Override public boolean accept(CarbonFile pathname) {
-  if (!pathname.isDirectory()) {
-if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
-
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
-+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
-  return true;
-}
-  }
-  return false;
-}
-  });
-} else {
-  listFiles = new CarbonFile[1];
-  listFiles[0] = csvFile;
-}
-return listFiles[0];
-  }
-
-  /**
-   * Get the file header from csv file.
-   */
-  public static String getFileHeader(CarbonFile csvFile)
-  throws DataLoadingException {
-DataInputStream fileReader = null;
-BufferedReader bufferedReader = null;
-String readLine = null;
-
-FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
-
-if (!csvFile.exists()) {
-  csvFile = FileFactory
-  .getCarbonFile(csvFile.getAbsolutePath() + 
CarbonCommonConstants.FILE_INPROGRESS_STATUS,
-  fileType);
-}
-
-try {
-  fileReader = 
FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
-  bufferedReader =
-  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
-  readLine = bufferedReader.readLine();
-} catch (FileNotFoundException e) {
-  LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
-  throw new DataLoadingException("CSV Input File not found ", e);
-} catch (IOException e) {
-  LOGGER.error(e, "Not able to read CSV input File  " + 
e.getMessage());
-  throw new DataLoadingException("Not able to read CSV input File ", 
e);
-} finally {
-  CarbonUtil.closeStreams(fileReader, bufferedReader);
-}
-
-return readLine;
-  }
-
-  public static boolean isHeaderValid(String tableName, String header,
-  CarbonDataLoadSchema schema, String delimiter) throws 
DataLoadingException {
-delimiter = CarbonUtil.delimiterConverter(delimiter);
+  public static boolean isHeaderValid(String tableName, String[] csvHeader,
+  CarbonDataLoadSchema schema) throws DataLoadingException {
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518309
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, 
String header,
 return count == columnNames.length;
   }
 
+  public static boolean isHeaderValid(String tableName, String header,
+  CarbonDataLoadSchema schema, String delimiter) throws 
DataLoadingException {
+delimiter = CarbonUtil.delimiterConverter(delimiter);
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507937
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -301,4 +304,45 @@ object CommonUtil {
   LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ 
newSplitSize.toString }")
 }
   }
+
+  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] 
= {
+val delimiter = if 
(StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
--- End diff --

I think the delimiter maybe a blank " "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507943
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -373,83 +368,15 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 return complexTypesMap;
   }
 
-  /**
-   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
-   *
-   * @param csvFilePath
-   * @return File
-   */
-  public static CarbonFile getCsvFileToRead(String csvFilePath) {
-CarbonFile csvFile =
-FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
-
-CarbonFile[] listFiles = null;
-if (csvFile.isDirectory()) {
-  listFiles = csvFile.listFiles(new CarbonFileFilter() {
-@Override public boolean accept(CarbonFile pathname) {
-  if (!pathname.isDirectory()) {
-if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
-
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
-+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
-  return true;
-}
-  }
-  return false;
-}
-  });
-} else {
-  listFiles = new CarbonFile[1];
-  listFiles[0] = csvFile;
-}
-return listFiles[0];
-  }
-
-  /**
-   * Get the file header from csv file.
-   */
-  public static String getFileHeader(CarbonFile csvFile)
-  throws DataLoadingException {
-DataInputStream fileReader = null;
-BufferedReader bufferedReader = null;
-String readLine = null;
-
-FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
-
-if (!csvFile.exists()) {
-  csvFile = FileFactory
-  .getCarbonFile(csvFile.getAbsolutePath() + 
CarbonCommonConstants.FILE_INPROGRESS_STATUS,
-  fileType);
-}
-
-try {
-  fileReader = 
FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
-  bufferedReader =
-  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
-  readLine = bufferedReader.readLine();
-} catch (FileNotFoundException e) {
-  LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
-  throw new DataLoadingException("CSV Input File not found ", e);
-} catch (IOException e) {
-  LOGGER.error(e, "Not able to read CSV input File  " + 
e.getMessage());
-  throw new DataLoadingException("Not able to read CSV input File ", 
e);
-} finally {
-  CarbonUtil.closeStreams(fileReader, bufferedReader);
-}
-
-return readLine;
-  }
-
-  public static boolean isHeaderValid(String tableName, String header,
-  CarbonDataLoadSchema schema, String delimiter) throws 
DataLoadingException {
-delimiter = CarbonUtil.delimiterConverter(delimiter);
+  public static boolean isHeaderValid(String tableName, String[] csvHeader,
+  CarbonDataLoadSchema schema) throws DataLoadingException {
--- End diff --

In this function, basically you want to compare two String array to find 
out weather they are the same, case-insensitively. 
take a look at 
http://stackoverflow.com/questions/2419061/compare-string-array-using-collection
According to this link, using TreeSet is optimal in this case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507253
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -373,83 +368,15 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 return complexTypesMap;
   }
 
-  /**
-   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
-   *
-   * @param csvFilePath
-   * @return File
-   */
-  public static CarbonFile getCsvFileToRead(String csvFilePath) {
-CarbonFile csvFile =
-FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
-
-CarbonFile[] listFiles = null;
-if (csvFile.isDirectory()) {
-  listFiles = csvFile.listFiles(new CarbonFileFilter() {
-@Override public boolean accept(CarbonFile pathname) {
-  if (!pathname.isDirectory()) {
-if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
-
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
-+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
-  return true;
-}
-  }
-  return false;
-}
-  });
-} else {
-  listFiles = new CarbonFile[1];
-  listFiles[0] = csvFile;
-}
-return listFiles[0];
-  }
-
-  /**
-   * Get the file header from csv file.
-   */
-  public static String getFileHeader(CarbonFile csvFile)
-  throws DataLoadingException {
-DataInputStream fileReader = null;
-BufferedReader bufferedReader = null;
-String readLine = null;
-
-FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
-
-if (!csvFile.exists()) {
-  csvFile = FileFactory
-  .getCarbonFile(csvFile.getAbsolutePath() + 
CarbonCommonConstants.FILE_INPROGRESS_STATUS,
-  fileType);
-}
-
-try {
-  fileReader = 
FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
-  bufferedReader =
-  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
-  readLine = bufferedReader.readLine();
-} catch (FileNotFoundException e) {
-  LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
-  throw new DataLoadingException("CSV Input File not found ", e);
-} catch (IOException e) {
-  LOGGER.error(e, "Not able to read CSV input File  " + 
e.getMessage());
-  throw new DataLoadingException("Not able to read CSV input File ", 
e);
-} finally {
-  CarbonUtil.closeStreams(fileReader, bufferedReader);
-}
-
-return readLine;
-  }
-
-  public static boolean isHeaderValid(String tableName, String header,
-  CarbonDataLoadSchema schema, String delimiter) throws 
DataLoadingException {
-delimiter = CarbonUtil.delimiterConverter(delimiter);
+  public static boolean isHeaderValid(String tableName, String[] csvHeader,
+  CarbonDataLoadSchema schema) throws DataLoadingException {
 String[] columnNames =
 CarbonDataProcessorUtil.getSchemaColumnNames(schema, 
tableName).toArray(new String[0]);
-String[] csvHeader = header.toLowerCase().split(delimiter);
 
-List csvColumnsList = new 
ArrayList(CarbonCommonConstants.CONSTANT_SIZE_TEN);
+List csvColumnsList = new ArrayList(csvHeader.length);
 
 for (String column : csvHeader) {
-  csvColumnsList.add(column.replaceAll("\"", "").trim());
+  csvColumnsList.add(column);
--- End diff --

use `Collections.addAll` instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507187
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, 
String header,
 return count == columnNames.length;
   }
 
+  public static boolean isHeaderValid(String tableName, String header,
+  CarbonDataLoadSchema schema, String delimiter) throws 
DataLoadingException {
+delimiter = CarbonUtil.delimiterConverter(delimiter);
--- End diff --

declare a local variable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506953
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -373,83 +368,15 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 return complexTypesMap;
   }
 
-  /**
-   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
-   *
-   * @param csvFilePath
-   * @return File
-   */
-  public static CarbonFile getCsvFileToRead(String csvFilePath) {
-CarbonFile csvFile =
-FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
-
-CarbonFile[] listFiles = null;
-if (csvFile.isDirectory()) {
-  listFiles = csvFile.listFiles(new CarbonFileFilter() {
-@Override public boolean accept(CarbonFile pathname) {
-  if (!pathname.isDirectory()) {
-if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
-
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
-+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
-  return true;
-}
-  }
-  return false;
-}
-  });
-} else {
-  listFiles = new CarbonFile[1];
-  listFiles[0] = csvFile;
-}
-return listFiles[0];
-  }
-
-  /**
-   * Get the file header from csv file.
-   */
-  public static String getFileHeader(CarbonFile csvFile)
-  throws DataLoadingException {
-DataInputStream fileReader = null;
-BufferedReader bufferedReader = null;
-String readLine = null;
-
-FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath());
-
-if (!csvFile.exists()) {
-  csvFile = FileFactory
-  .getCarbonFile(csvFile.getAbsolutePath() + 
CarbonCommonConstants.FILE_INPROGRESS_STATUS,
-  fileType);
-}
-
-try {
-  fileReader = 
FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType);
-  bufferedReader =
-  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
-  readLine = bufferedReader.readLine();
-} catch (FileNotFoundException e) {
-  LOGGER.error(e, "CSV Input File not found  " + e.getMessage());
-  throw new DataLoadingException("CSV Input File not found ", e);
-} catch (IOException e) {
-  LOGGER.error(e, "Not able to read CSV input File  " + 
e.getMessage());
-  throw new DataLoadingException("Not able to read CSV input File ", 
e);
-} finally {
-  CarbonUtil.closeStreams(fileReader, bufferedReader);
-}
-
-return readLine;
-  }
-
-  public static boolean isHeaderValid(String tableName, String header,
-  CarbonDataLoadSchema schema, String delimiter) throws 
DataLoadingException {
-delimiter = CarbonUtil.delimiterConverter(delimiter);
+  public static boolean isHeaderValid(String tableName, String[] csvHeader,
+  CarbonDataLoadSchema schema) throws DataLoadingException {
--- End diff --

I think DataLoadingException can be removed, it is not thrown by the body


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506643
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -301,4 +304,45 @@ object CommonUtil {
   LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ 
newSplitSize.toString }")
 }
   }
+
+  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] 
= {
+val delimiter = if 
(StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
+  CarbonCommonConstants.COMMA
+} else {
+  CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter)
+}
+var csvFile: String = null
+var csvHeader: String = carbonLoadModel.getCsvHeader
+val csvColumns = if (StringUtils.isBlank(csvHeader)) {
+  // read header from csv file
+  csvFile = carbonLoadModel.getFactFilePath.split(",")(0)
+  csvHeader = CarbonUtil.readHeader(csvFile)
+  if (StringUtils.isBlank(csvHeader)) {
+throw new CarbonDataLoadingException("First line of the csv is not 
valid.")
+  }
+  csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", 
"").trim)
+} else {
+  csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim)
+}
+
+if 
(!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, 
csvColumns,
+carbonLoadModel.getCarbonDataLoadSchema)) {
+  if (csvFile == null) {
+LOGGER.error("CSV header provided in DDL is not proper."
+ + " Column names in schema and CSV header are not the 
same.")
+throw new CarbonDataLoadingException(
+  "CSV header provided in DDL is not proper. Column names in 
schema and CSV header are "
+  + "not the same.")
+  } else {
+LOGGER.error(
+  "CSV File provided is not proper. Column names in schema and csv 
header are not same. "
--- End diff --

Better to tell "CSV header in the input file ($csvFile) is not proper."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...

2017-01-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/518#discussion_r95505900
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -301,4 +304,45 @@ object CommonUtil {
   LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ 
newSplitSize.toString }")
 }
   }
+
+  def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] 
= {
+val delimiter = if 
(StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) {
--- End diff --

I think delimiter can not be " ", right? so better to use isBlank instead 
of isEmpty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---