[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-01 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@kevinjmh 
1. when you set in INVERTED_INDEX , but not in SORT_COLUMNS, then data will 
not be sorted, only RLE will be applied on data.
2. `isInvertedIndex ` basically this boolean cannot say that, it is do RLE, 
basically RLE will be applied on both inverted and no inverted case, but after 
checking `isInvertedIndex ` it decides to sort based on isSort check



---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
LGTM


---


[jira] [Resolved] (CARBONDATA-3062) Fix Compatibility issue with cache_level as blocklet

2018-11-01 Thread Manish Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Gupta resolved CARBONDATA-3062.
--
   Resolution: Fixed
Fix Version/s: 1.5.1

> Fix Compatibility issue with cache_level as blocklet
> 
>
> Key: CARBONDATA-3062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 1.5.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Please find below steps to reproduce the issue:
>  # Create table and load data in legacy store
>  # In new store, load data and alter table set table properties 
> 'CACHE_LEVEL'='BLOCKLET'
>  # Perform Filter operation on that table and find below Exception
>  
> |*Error: java.io.IOException: Problem in loading segment blocks. 
> (state=,code=0)*
>   |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3062) Fix Compatibility issue with cache_level as blocklet

2018-11-01 Thread Manish Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Gupta updated CARBONDATA-3062:
-
Issue Type: Bug  (was: Improvement)

> Fix Compatibility issue with cache_level as blocklet
> 
>
> Key: CARBONDATA-3062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3062
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 1.5.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Please find below steps to reproduce the issue:
>  # Create table and load data in legacy store
>  # In new store, load data and alter table set table properties 
> 'CACHE_LEVEL'='BLOCKLET'
>  # Perform Filter operation on that table and find below Exception
>  
> |*Error: java.io.IOException: Problem in loading segment blocks. 
> (state=,code=0)*
>   |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3062) Fix Compatibility issue with cache_level as blocklet

2018-11-01 Thread Indhumathi Muthumurugesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3062:
-
Description: 
Please find below steps to reproduce the issue:
 # Create table and load data in legacy store
 # In new store, load data and alter table set table properties 
'CACHE_LEVEL'='BLOCKLET'
 # Perform Filter operation on that table and find below Exception

 
|*Error: java.io.IOException: Problem in loading segment blocks. 
(state=,code=0)*
  |

  was:
# Create table and load data in legacy store
 # In new store, load data and alter table set table properties 
'CACHE_LEVEL'='BLOCKLET'
 # Perform Select operation on that table and find below Exception

 
|
|*Error: java.io.IOException: Problem in loading segment blocks. 
(state=,code=0)*
 |
|


> Fix Compatibility issue with cache_level as blocklet
> 
>
> Key: CARBONDATA-3062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Please find below steps to reproduce the issue:
>  # Create table and load data in legacy store
>  # In new store, load data and alter table set table properties 
> 'CACHE_LEVEL'='BLOCKLET'
>  # Perform Filter operation on that table and find below Exception
>  
> |*Error: java.io.IOException: Problem in loading segment blocks. 
> (state=,code=0)*
>   |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230273109
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1723,6 +1724,92 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 assertEquals(RowUtil.getFloat(data, 11), (float) 1.23);
 i++;
   }
+  assert  (i == 10);
+  reader.close();
+} catch (Throwable e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());
+} finally {
+  try {
+FileUtils.deleteDirectory(new File(path));
+  } catch (IOException e) {
+e.printStackTrace();
+Assert.fail(e.getMessage());
+  }
+}
+  }
+
+  @Test
+  public void testVectorReader() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("byteField", DataTypes.BYTE);
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+String.valueOf(i),
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withVectorReader(true)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 4), i);
+assertEquals(RowUtil.getInt(data, 5), i);
+assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2);
+assert (RowUtil.getByte(data, 8).equals(new Byte("1")));
--- End diff --

ok


---


[jira] [Updated] (CARBONDATA-3062) Fix Compatibility issue with cache_level as blocklet

2018-11-01 Thread Indhumathi Muthumurugesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3062:
-
Description: 
# Create table and load data in legacy store
 # In new store, load data and alter table set table properties 
'CACHE_LEVEL'='BLOCKLET'
 # Perform Select operation on that table and find below Exception

 
|
|*Error: java.io.IOException: Problem in loading segment blocks. 
(state=,code=0)*
 |
|

> Fix Compatibility issue with cache_level as blocklet
> 
>
> Key: CARBONDATA-3062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> # Create table and load data in legacy store
>  # In new store, load data and alter table set table properties 
> 'CACHE_LEVEL'='BLOCKLET'
>  # Perform Select operation on that table and find below Exception
>  
> |
> |*Error: java.io.IOException: Problem in loading segment blocks. 
> (state=,code=0)*
>  |
> |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230273015
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -158,14 +173,31 @@ public CarbonReaderBuilder withHadoopConf(String key, 
String value) {
 }
 
 try {
-  final List splits =
-  format.getSplits(new JobContextImpl(job.getConfiguration(), new 
JobID()));
-
+  List splits;
+  if (filterExpression == null) {
+splits = format.getAllFileSplits(job);
+  } else {
+splits = format.getSplits(new 
JobContextImpl(job.getConfiguration(), new JobID()));
+  }
   List> readers = new ArrayList<>(splits.size());
   for (InputSplit split : splits) {
 TaskAttemptContextImpl attempt =
 new TaskAttemptContextImpl(job.getConfiguration(), new 
TaskAttemptID());
-RecordReader reader = format.createRecordReader(split, attempt);
+RecordReader reader;
+QueryModel queryModel = format.createQueryModel(split, attempt);
+boolean hasComplex = false;
+for (ProjectionDimension projectionDimension : 
queryModel.getProjectionDimensions()) {
+  if (projectionDimension.getDimension().isComplex()) {
--- End diff --

Vectorised Reader is not supported for complex types. When carbonsession 
support complex then we can enable for SDK alse.


---


[GitHub] carbondata pull request #2883: [CARBONDATA-3062] Fix Compatibility issue wit...

2018-11-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2883


---


[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...

2018-11-01 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r230272267
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -114,6 +115,57 @@ public static CarbonReaderBuilder builder(String 
tablePath) {
 return builder(tablePath, tableName);
   }
 
+  /**
+   * Breaks the list of CarbonRecordReader in CarbonReader into multiple
+   * CarbonReader objects, each iterating through some 'carbondata' files
+   * and return that list of CarbonReader objects
+   *
+   * If the no. of files is greater than maxSplits, then break the
+   * CarbonReader into maxSplits splits, with each split iterating
+   * through >= 1 file.
+   *
+   * If the no. of files is less than maxSplits, then return list of
+   * CarbonReader with size as the no. of files, with each CarbonReader
+   * iterating through exactly one file
+   *
+   * @param maxSplits: Int
+   * @return list of {@link CarbonReader} objects
+   */
+  public List split(int maxSplits) throws IOException {
--- End diff --

@ravipesala : Adding to builder will break builder pattern, recently we 
removed arguments from build() and make it as separate API for SDK writer. 
Reader also followed same.


---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1230/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1228/



---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1229/



---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230272135
  
--- Diff: docs/sdk-guide.md ---
@@ -429,6 +429,15 @@ public CarbonWriterBuilder 
withAvroInput(org.apache.avro.Schema avroSchema);
 public CarbonWriterBuilder withJsonInput(Schema carbonSchema);
 ```
 
+```
+/**
+* To support writing the ApplicationName which is writing the carbondata 
file
+* @param application name which is writing the carbondata files
+* @return CarbonWriterBuilder
--- End diff --

this is mandatory because SDK can be used by different application, one 
application can write and other can read this, so we store this info, during 
load case we take from spark application name, SDK needs to provide this


---


[GitHub] carbondata issue #2888: [CARBONDATA-3066]add documentation for writtenBy and...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2888
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1227/



---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-11-01 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
LGTM


---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
retest this please


---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230270106
  
--- Diff: docs/sdk-guide.md ---
@@ -124,7 +124,7 @@ public class TestSdkAvro {
 try {
   CarbonWriter writer = CarbonWriter.builder()
   .outputPath(path)
-  .withAvroInput(new 
org.apache.avro.Schema.Parser().parse(avroSchema)).build();
+  .withAvroInput(new 
org.apache.avro.Schema.Parser().parse(avroSchema))..writtenBy("SDK").build();
--- End diff --

done


---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230270097
  
--- Diff: docs/sdk-guide.md ---
@@ -686,6 +695,16 @@ Find example code at 
[CarbonReaderExample](https://github.com/apache/carbondata/
   public static Schema readSchemaInIndexFile(String indexFilePath);
 ```
 
+```
+  /**
+   * This method return the version details in formatted string by reading 
from carbondata file
+   * @param dataFilePath complete path including carbondata file name
+   * @return string with information of who has written this file in which 
carbondata project version
--- End diff --

done


---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9490/



---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1441/



---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230264659
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1723,6 +1724,92 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 assertEquals(RowUtil.getFloat(data, 11), (float) 1.23);
 i++;
   }
+  assert  (i == 10);
+  reader.close();
+} catch (Throwable e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());
+} finally {
+  try {
+FileUtils.deleteDirectory(new File(path));
+  } catch (IOException e) {
+e.printStackTrace();
+Assert.fail(e.getMessage());
+  }
+}
+  }
+
+  @Test
+  public void testVectorReader() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("byteField", DataTypes.BYTE);
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+String.valueOf(i),
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withVectorReader(true)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 4), i);
+assertEquals(RowUtil.getInt(data, 5), i);
+assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2);
+assert (RowUtil.getByte(data, 8).equals(new Byte("1")));
--- End diff --

Can you support getBoolean? we shouldn't return byte for boolean directly


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-01 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
The InvertedIndex/NoInvertedIndex setting is confusing.
1. the value `isInvertedIndex` assigned to different IndexCodec  in 
`createEncoderForDimensionLegacy` requires us to set the column both 
SortColumns and use InvertedIndex. What if I set it in INVERTED_INDEX but not 
in SORT_COLUMNS?
2. what the boolean value `isInvertedIndex` in IndexCodec do is to control 
whether to do RLE on datapage?

These make the setting not a direct switch to control how the data proceed


---


[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...

2018-11-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2863#discussion_r230257378
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/SnappyCompressor.java
 ---
@@ -90,7 +90,7 @@ public String getName() {
 try {
   uncompressedLength = Snappy.uncompressedLength(compInput, offset, 
length);
   data = new byte[uncompressedLength];
-  Snappy.uncompress(compInput, offset, length, data, 0);
+  snappyNative.rawUncompress(compInput, offset, length, data, 0);
--- End diff --

is it safe to use `SnappyNative` class directly? It's documentation says we 
should not use this class directly


---


[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...

2018-11-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2863#discussion_r230260860
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
 ---
@@ -224,135 +239,134 @@ public void decodeAndFillVector(ColumnPage 
columnPage, ColumnVectorInfo vectorIn
   }
 }
 
-private void fillVector(ColumnPage columnPage, CarbonColumnVector 
vector,
-DataType vectorDataType, DataType pageDataType, int pageSize, 
ColumnVectorInfo vectorInfo) {
+private void fillVector(byte[] pageData, CarbonColumnVector vector, 
DataType vectorDataType,
+DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo, 
BitSet nullBits) {
+  int k = 0;
--- End diff --

Rename `k` to a meaningful name


---


[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...

2018-11-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2863#discussion_r230258183
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPageValueConverter.java
 ---
@@ -37,5 +40,6 @@
   double decodeDouble(long value);
   double decodeDouble(float value);
   double decodeDouble(double value);
-  void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo 
vectorInfo);
+  void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, 
BitSet nullBits,
+  DataType pageDataType, int pageSize);
--- End diff --

From `ColumnPageDecoder` we directly calling for `decodeAndFillVector`. So 
is this method `decodeAndFillVector` required  to be defined again in the 
interface `ColumnPageValueConverter.java`? I think there is no need to define 
the method again instead we can directly fill the vector in the decoder method 
call itself. Please check the feasibility of it


---


[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...

2018-11-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2863#discussion_r230261176
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ---
@@ -243,6 +244,11 @@ private int getDecimalCount(double value) {
   int integerPlaces = strValue.indexOf('.');
   if (-1 != integerPlaces) {
 decimalPlaces = strValue.length() - integerPlaces - 1;
+if (decimalPlaces == 1) {
--- End diff --

please add a comment to explain the scenario for which this code is 
added...the logic is clear but not sure for the scenario where it can lead to 
error/exception


---


[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...

2018-11-01 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2863#discussion_r230117118
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
 ---
@@ -244,59 +243,56 @@ public double decodeDouble(long value) {
 }
 
 @Override
-public void decodeAndFillVector(ColumnPage columnPage, 
ColumnVectorInfo vectorInfo) {
+public void decodeAndFillVector(byte[] pageData, ColumnVectorInfo 
vectorInfo, BitSet nullBits,
+DataType pageDataType, int pageSize) {
   CarbonColumnVector vector = vectorInfo.vector;
-  BitSet nullBits = columnPage.getNullBits();
-  DataType pageDataType = columnPage.getDataType();
-  int pageSize = columnPage.getPageSize();
   BitSet deletedRows = vectorInfo.deletedRows;
   DataType vectorDataType = vector.getType();
   vector = ColumnarVectorWrapperDirectFactory
   .getDirectVectorWrapperFactory(vector, null, nullBits, 
deletedRows, true, false);
+  int k = 0;
--- End diff --

Rename `k` to a meaningful name


---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
@kanaka yes, sorry


---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
@KanakaKumar  CI PASS .please check


---


[GitHub] carbondata pull request #2886: [CARBONDATA-3065]make inverted index false by...

2018-11-01 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2886#discussion_r230260481
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -359,8 +359,13 @@ private CarbonCommonConstants() {
   public static final String TABLE_BLOCKSIZE = "table_blocksize";
   // table blocklet size in MB
   public static final String TABLE_BLOCKLET_SIZE = "table_blocklet_size";
-  // set in column level to disable inverted index
+  /**
+   * set in column level to disable inverted index
+   * @Deprecated :This property is deprecated, it is kep just for 
compatibility
--- End diff --

spelling: kept


---


[GitHub] carbondata issue #2862: [HOTFIX] Enable Local dictionary by default

2018-11-01 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2862
  
please remember to update the doc too.


---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1226/



---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread kanaka
Github user kanaka commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
@xubo245 I think you meant @KanakaKumar 


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230256502
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -158,14 +173,31 @@ public CarbonReaderBuilder withHadoopConf(String key, 
String value) {
 }
 
 try {
-  final List splits =
-  format.getSplits(new JobContextImpl(job.getConfiguration(), new 
JobID()));
-
+  List splits;
+  if (filterExpression == null) {
+splits = format.getAllFileSplits(job);
+  } else {
+splits = format.getSplits(new 
JobContextImpl(job.getConfiguration(), new JobID()));
+  }
   List> readers = new ArrayList<>(splits.size());
   for (InputSplit split : splits) {
 TaskAttemptContextImpl attempt =
 new TaskAttemptContextImpl(job.getConfiguration(), new 
TaskAttemptID());
-RecordReader reader = format.createRecordReader(split, attempt);
+RecordReader reader;
+QueryModel queryModel = format.createQueryModel(split, attempt);
+boolean hasComplex = false;
+for (ProjectionDimension projectionDimension : 
queryModel.getProjectionDimensions()) {
+  if (projectionDimension.getDimension().isComplex()) {
--- End diff --

Can you support Array? because SDK/CSDK user need use this data 
type.


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
@KanakaKumar CI pass, please check


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@akashrn5 Please expose these properties from SDK and fileformat as well.


---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-11-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
LGTM , Please fix the CI


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9489/



---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1440/



---


[GitHub] carbondata issue #2715: [CARBONDATA-2930] Support customize column compresso...

2018-11-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2715
  
LGTM


---


[GitHub] carbondata issue #2890: [CARBONDATA-3002] Fix some spell error

2018-11-01 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2890
  
LGTM


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230252544
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration 
configuration) throws IO
 }
   }
 
+  /**
+   * This method will list all the carbondata files in the table path and 
treat one carbondata
+   * file as one split.
+   */
+  public List getAllFileSplits(JobContext job) throws 
IOException {
--- End diff --

Don't add more public methods. Please take a conf property from outside and 
decide whether to store in datamap or just list splits plainly like this.


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230252316
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalType;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.executor.QueryExecutor;
+import org.apache.carbondata.core.scan.executor.QueryExecutorFactory;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.model.ProjectionDimension;
+import org.apache.carbondata.core.scan.model.ProjectionMeasure;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.hadoop.AbstractRecordReader;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.log4j.Logger;
+
+/**
+ * A specialized RecordReader that reads into CarbonColumnarBatches 
directly using the
+ * carbondata column APIs and fills the data directly into columns.
+ */
+public class CarbonVectorizedRecordReader extends 
AbstractRecordReader {
+
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName());
+
+  private CarbonColumnarBatch carbonColumnarBatch;
+
+  private QueryExecutor queryExecutor;
+
+  private int batchIdx = 0;
+
+  private int numBatched = 0;
+
+  private AbstractDetailQueryResultIterator iterator;
+
+  private QueryModel queryModel;
+
+  public CarbonVectorizedRecordReader(QueryModel queryModel) {
+this.queryModel = queryModel;
+  }
+
+  @Override public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptContext)
+  throws IOException, InterruptedException {
+List splitList;
+if (inputSplit instanceof CarbonInputSplit) {
+  splitList = new ArrayList<>(1);
+  splitList.add((CarbonInputSplit) inputSplit);
+} else {
+  throw new RuntimeException("unsupported input split type: " + 
inputSplit);
+}
+List tableBlockInfoList = 
CarbonInputSplit.createBlocks(splitList);
+queryModel.setTableBlockInfos(tableBlockInfoList);
+queryModel.setVectorReader(true);
+try {
+  queryExecutor =
+  QueryExecutorFactory.getQueryExecutor(queryModel, 
taskAttemptContext.getConfiguration());
+  iterator = (AbstractDetailQueryResultIterator) 
queryExecutor.execute(queryModel);
+  initBatch();
+} catch (QueryExecutionException e) {
+  LOGGER.error(e);
+  throw new InterruptedException(e.getMessage());
+} catch (Exception e) {
+  LOGGER.error(e);
+  throw e;
+}
+  }
+
+  @Override public boolean nextKeyValue() throws IOException, 
InterruptedException {
+if 

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230251863
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalType;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.executor.QueryExecutor;
+import org.apache.carbondata.core.scan.executor.QueryExecutorFactory;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.model.ProjectionDimension;
+import org.apache.carbondata.core.scan.model.ProjectionMeasure;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.hadoop.AbstractRecordReader;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.log4j.Logger;
+
+/**
+ * A specialized RecordReader that reads into CarbonColumnarBatches 
directly using the
+ * carbondata column APIs and fills the data directly into columns.
+ */
+public class CarbonVectorizedRecordReader extends 
AbstractRecordReader {
+
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName());
+
+  private CarbonColumnarBatch carbonColumnarBatch;
+
+  private QueryExecutor queryExecutor;
+
+  private int batchIdx = 0;
+
+  private int numBatched = 0;
+
+  private AbstractDetailQueryResultIterator iterator;
+
+  private QueryModel queryModel;
+
+  public CarbonVectorizedRecordReader(QueryModel queryModel) {
+this.queryModel = queryModel;
+  }
+
+  @Override public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptContext)
+  throws IOException, InterruptedException {
+List splitList;
+if (inputSplit instanceof CarbonInputSplit) {
+  splitList = new ArrayList<>(1);
+  splitList.add((CarbonInputSplit) inputSplit);
+} else {
+  throw new RuntimeException("unsupported input split type: " + 
inputSplit);
+}
+List tableBlockInfoList = 
CarbonInputSplit.createBlocks(splitList);
+queryModel.setTableBlockInfos(tableBlockInfoList);
+queryModel.setVectorReader(true);
+try {
+  queryExecutor =
+  QueryExecutorFactory.getQueryExecutor(queryModel, 
taskAttemptContext.getConfiguration());
+  iterator = (AbstractDetailQueryResultIterator) 
queryExecutor.execute(queryModel);
+} catch (QueryExecutionException e) {
+  LOGGER.error(e);
+  throw new InterruptedException(e.getMessage());
+} catch (Exception e) {
+  LOGGER.error(e);
+  throw e;
+}
+  }
+
+  @Override public boolean nextKeyValue() throws IOException, 
InterruptedException {
+initBatch();
+if 

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230251542
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -88,6 +99,50 @@ public CarbonTable getOrCreateCarbonTable(Configuration 
configuration) throws IO
 }
   }
 
+  /**
+   * This method will list all the carbondata files in the table path and 
treat one carbondata
+   * file as one split.
+   */
+  public List getAllFileSplits(JobContext job) throws 
IOException {
+List splits = new ArrayList<>();
+CarbonTable carbonTable = 
getOrCreateCarbonTable(job.getConfiguration());
+if (null == carbonTable) {
+  throw new IOException("Missing/Corrupt schema file for table.");
+}
+for (CarbonFile carbonFile : 
getAllCarbonDataFiles(carbonTable.getTablePath())) {
+  CarbonInputSplit split =
+  new CarbonInputSplit("null", new 
Path(carbonFile.getAbsolutePath()), 0,
+  carbonFile.getLength(), carbonFile.getLocations(), 
FileFormat.COLUMNAR_V3);
+  split.setVersion(ColumnarFormatVersion.V3);
+  BlockletDetailInfo info = new BlockletDetailInfo();
+  split.setDetailInfo(info);
+  info.setBlockSize(carbonFile.getLength());
+  // Read the footer offset and set.
+  FileReader reader = FileFactory
--- End diff --

Reading of filefooter offset should not be inside getsplits as it will 
increase the getSplits time if files are more. it should be handled it during 
the record reader initialization time in executor side or inside the thread. 


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230251185
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1737,4 +1738,89 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 }
   }
 
+  @Test
+  public void testVectorReader() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("byteField", DataTypes.BYTE);
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+String.valueOf(i),
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withVectorReader(true)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 4), i);
+assertEquals(RowUtil.getInt(data, 5), i);
+assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2);
+assert (RowUtil.getByte(data, 8).equals(new Byte("1")));
+assertEquals(RowUtil.getInt(data, 1), 17957);
+assertEquals(RowUtil.getLong(data, 2), 154992081400L);
+assert (RowUtil.getDecimal(data, 9).equals("12.35"));
+assert (RowUtil.getString(data, 3).equals("varchar"));
+assertEquals(RowUtil.getByte(data, 10), new 
Byte(String.valueOf(i)));
+assertEquals(RowUtil.getFloat(data, 11), new Float("1.23"));
+i++;
+  }
+  reader.close();
--- End diff --

where?


---


[jira] [Closed] (CARBONDATA-3068) cannot load data from hdfs files without hdfs prefix

2018-11-01 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping closed CARBONDATA-3068.
---

> cannot load data from hdfs files without hdfs prefix
> 
>
> Key: CARBONDATA-3068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3068
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.0
>Reporter: lianganping
>Assignee: lianganping
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> sql:
> LOAD DATA INPATH '/tmp/test.csv' INTO TABLE test 
> OPTIONS('QUOTECHAR'='"','TIMESTAMPFORMAT'='/MM/dd HH:mm:ss');
> error:
> org.apache.carbondata.processing.exception.DataLoadingException: The input 
> file does not exist: /tmp/test.csv (state=,code=0)
> but the file "test.csv" is in hdfs path, and hadoop conf "core-site.xml" has 
> the property:
> fs.defaultFS
> hdfs://master:9000



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3068) cannot load data from hdfs files without hdfs prefix

2018-11-01 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping updated CARBONDATA-3068:

Description: 
sql:
LOAD DATA INPATH '/tmp/test.csv' INTO TABLE test 
OPTIONS('QUOTECHAR'='"','TIMESTAMPFORMAT'='/MM/dd HH:mm:ss');
error:
org.apache.carbondata.processing.exception.DataLoadingException: The input file 
does not exist: /tmp/test.csv (state=,code=0)
but the file "test.csv" is in hdfs path, and hadoop conf "core-site.xml" has 
the property:

fs.defaultFS
hdfs://master:9000

> cannot load data from hdfs files without hdfs prefix
> 
>
> Key: CARBONDATA-3068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3068
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.0
>Reporter: lianganping
>Assignee: lianganping
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> sql:
> LOAD DATA INPATH '/tmp/test.csv' INTO TABLE test 
> OPTIONS('QUOTECHAR'='"','TIMESTAMPFORMAT'='/MM/dd HH:mm:ss');
> error:
> org.apache.carbondata.processing.exception.DataLoadingException: The input 
> file does not exist: /tmp/test.csv (state=,code=0)
> but the file "test.csv" is in hdfs path, and hadoop conf "core-site.xml" has 
> the property:
> fs.defaultFS
> hdfs://master:9000



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3068) cannot load data from hdfs files without hdfs prefix

2018-11-01 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping resolved CARBONDATA-3068.
-
Resolution: Fixed

> cannot load data from hdfs files without hdfs prefix
> 
>
> Key: CARBONDATA-3068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3068
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.0
>Reporter: lianganping
>Assignee: lianganping
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> sql:
> LOAD DATA INPATH '/tmp/test.csv' INTO TABLE test 
> OPTIONS('QUOTECHAR'='"','TIMESTAMPFORMAT'='/MM/dd HH:mm:ss');
> error:
> org.apache.carbondata.processing.exception.DataLoadingException: The input 
> file does not exist: /tmp/test.csv (state=,code=0)
> but the file "test.csv" is in hdfs path, and hadoop conf "core-site.xml" has 
> the property:
> fs.defaultFS
> hdfs://master:9000



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3068) cannot load data from hdfs files without hdfs prefix

2018-11-01 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping closed CARBONDATA-3068.
---

> cannot load data from hdfs files without hdfs prefix
> 
>
> Key: CARBONDATA-3068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3068
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.0
>Reporter: lianganping
>Assignee: lianganping
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (CARBONDATA-3068) cannot load data from hdfs files without hdfs prefix

2018-11-01 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping reopened CARBONDATA-3068:
-

> cannot load data from hdfs files without hdfs prefix
> 
>
> Key: CARBONDATA-3068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3068
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.0
>Reporter: lianganping
>Assignee: lianganping
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3068) cannot load data from hdfs files without hdfs prefix

2018-11-01 Thread lianganping (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lianganping resolved CARBONDATA-3068.
-
Resolution: Fixed

> cannot load data from hdfs files without hdfs prefix
> 
>
> Key: CARBONDATA-3068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3068
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.0
>Reporter: lianganping
>Assignee: lianganping
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230247864
  
--- Diff: docs/sdk-guide.md ---
@@ -429,6 +429,15 @@ public CarbonWriterBuilder 
withAvroInput(org.apache.avro.Schema avroSchema);
 public CarbonWriterBuilder withJsonInput(Schema carbonSchema);
 ```
 
+```
+/**
+* To support writing the ApplicationName which is writing the carbondata 
file
+* @param application name which is writing the carbondata files
+* @return CarbonWriterBuilder
--- End diff --

why is this mandatory? Why not use 'anonymous' instead?


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230247574
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
 ---
@@ -347,9 +347,7 @@ private void fillVector(ColumnPage columnPage, 
CarbonColumnVector vector,
 columnPage.getNullBits());
   } else if (vectorDataType == DataTypes.FLOAT) {
 float[] floatPage = columnPage.getFloatPage();
-for (int i = 0; i < pageSize; i++) {
-  vector.putFloats(0, pageSize, floatPage, 0);
-}
+vector.putFloats(0, pageSize, floatPage, 0);
--- End diff --

Line numbers from 322 to 334 also remove as it is duplicated.


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230247332
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
 ---
@@ -524,6 +524,22 @@ public DataOutputStream 
getDataOutputStreamUsingAppend(String path, FileFactory.
 return getFiles(listStatus);
   }
 
+  @Override public List listFiles(Boolean recursive, 
CarbonFileFilter fileFilter)
--- End diff --

Move ` @Override` down and add doc for public method


---


[GitHub] carbondata pull request #2891: [CARBONDATA-3068] fixed cannot load data from...

2018-11-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2891


---


[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r230246531
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -114,6 +115,57 @@ public static CarbonReaderBuilder builder(String 
tablePath) {
 return builder(tablePath, tableName);
   }
 
+  /**
+   * Breaks the list of CarbonRecordReader in CarbonReader into multiple
+   * CarbonReader objects, each iterating through some 'carbondata' files
+   * and return that list of CarbonReader objects
+   *
+   * If the no. of files is greater than maxSplits, then break the
+   * CarbonReader into maxSplits splits, with each split iterating
+   * through >= 1 file.
+   *
+   * If the no. of files is less than maxSplits, then return list of
+   * CarbonReader with size as the no. of files, with each CarbonReader
+   * iterating through exactly one file
+   *
+   * @param maxSplits: Int
+   * @return list of {@link CarbonReader} objects
+   */
+  public List split(int maxSplits) throws IOException {
+validateReader();
+if (maxSplits < 1) {
+  throw new RuntimeException(
+  this.getClass().getSimpleName() + ".split: maxSplits must be 
positive");
+}
+
+List carbonReaders = new ArrayList<>();
+
+if (maxSplits < this.readers.size()) {
--- End diff --

Add UT only to this method to make sure splits happen correctly with 
multiple splits combinations and readers size,


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1225/



---


[GitHub] carbondata issue #2891: [CARBONDATA-3068] fixed cannot load data from hdfs f...

2018-11-01 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2891
  
LGTM


---


[GitHub] carbondata issue #2889: [CARBONDATA-3067] Add check for debug to avoid strin...

2018-11-01 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2889
  
@jackylk Do you want to change that to `debug`? Currently it is info level 
which is not in the scope of this PR since info log may always be logged and 
that concatenation is unavoidable.


---


[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...

2018-11-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r230244674
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -114,6 +115,57 @@ public static CarbonReaderBuilder builder(String 
tablePath) {
 return builder(tablePath, tableName);
   }
 
+  /**
+   * Breaks the list of CarbonRecordReader in CarbonReader into multiple
+   * CarbonReader objects, each iterating through some 'carbondata' files
+   * and return that list of CarbonReader objects
+   *
+   * If the no. of files is greater than maxSplits, then break the
+   * CarbonReader into maxSplits splits, with each split iterating
+   * through >= 1 file.
+   *
+   * If the no. of files is less than maxSplits, then return list of
+   * CarbonReader with size as the no. of files, with each CarbonReader
+   * iterating through exactly one file
+   *
+   * @param maxSplits: Int
+   * @return list of {@link CarbonReader} objects
+   */
+  public List split(int maxSplits) throws IOException {
--- End diff --

I feel this method should be moved to builder. Add another method in 
builder `build(int splits)` and return List of readers.


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
retest this please


---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
@kanaka  CI PASS .please check


---


[GitHub] carbondata issue #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2885
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9484/



---


[GitHub] carbondata issue #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2885
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1435/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9488/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1439/



---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9485/



---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9486/



---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1436/



---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1437/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1224/



---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9487/



---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1223/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9483/



---


[GitHub] carbondata issue #2875: [CARBONDATA-3038] Refactor dynamic configuration

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2875
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1438/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1434/



---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1222/



---


[GitHub] carbondata issue #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2885
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1220/



---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1221/



---


[GitHub] carbondata issue #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2885
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9482/



---


[GitHub] carbondata issue #2891: [CARBONDATA-3068] fixed cannot load data from hdfs f...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2891
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1430/



---


[GitHub] carbondata issue #2891: [CARBONDATA-3068] fixed cannot load data from hdfs f...

2018-11-01 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2891
  
LGTM


---


[GitHub] carbondata issue #2842: [CARBONDATA-3032] Remove carbon.blocklet.size from p...

2018-11-01 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2842
  
@jackylk I updated.please check


---


[GitHub] carbondata issue #2887: [HOTFIX] Throw original exception in thread pool

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2887
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9480/



---


[GitHub] carbondata issue #2891: [CARBONDATA-3068] fixed cannot load data from hdfs f...

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2891
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9479/



---


[GitHub] carbondata issue #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2885
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1218/



---


[GitHub] carbondata issue #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2885
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1433/



---


[GitHub] carbondata issue #2887: [HOTFIX] Throw original exception in thread pool

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2887
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1431/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1432/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1217/



---


[GitHub] carbondata issue #2887: [HOTFIX] Throw original exception in thread pool

2018-11-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2887
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1216/



---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread Sssan520
Github user Sssan520 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230032701
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java ---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.impl.AuditLevel;
+
+import com.google.gson.Gson;
+import com.google.gson.GsonBuilder;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.log4j.Logger;
+
+/**
+ * Audit logger.
+ * User can configure log4j to log to a separate file. For example
+ *
+ *  log4j.logger.carbon.audit=DEBUG, audit
+ *  log4j.appender.audit=org.apache.log4j.FileAppender
+ *  log4j.appender.audit.File=/opt/logs/audit.out
+ *  log4j.appender.audit.Threshold=AUDIT
+ *  log4j.appender.audit.Append=false
+ *  log4j.appender.audit.layout=org.apache.log4j.PatternLayout
+ *  log4j.appender.audit.layout.ConversionPattern=%m%n
+ */
+@InterfaceAudience.Internal
+public class Auditor {
+  private static final Logger LOGGER = Logger.getLogger("carbon.audit");
+  private static String username;
+
+  static {
+try {
+  username = UserGroupInformation.getCurrentUser().getShortUserName();
+} catch (IOException e) {
+  username = "unknown";
+}
+  }
+
+  public static void logOperationStart(String opName, String opId) {
+OpStartMessage message = new OpStartMessage(opName, opId);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  public static void logOperationSuccess(String opName, String opId, 
String table,
+  String opTime, Map extraInfo) {
+OpEndMessage message = new OpEndMessage(opName, opId, table, opTime,
+OpStatus.SUCCESS, extraInfo);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
--- End diff --

> ok, seems GSON is thread safe, so I change it to static member
yes there is a fix pr: https://github.com/google/gson/issues/63



---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230031804
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java ---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.impl.AuditLevel;
+
+import com.google.gson.Gson;
+import com.google.gson.GsonBuilder;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.log4j.Logger;
+
+/**
+ * Audit logger.
+ * User can configure log4j to log to a separate file. For example
+ *
+ *  log4j.logger.carbon.audit=DEBUG, audit
+ *  log4j.appender.audit=org.apache.log4j.FileAppender
+ *  log4j.appender.audit.File=/opt/logs/audit.out
+ *  log4j.appender.audit.Threshold=AUDIT
+ *  log4j.appender.audit.Append=false
+ *  log4j.appender.audit.layout=org.apache.log4j.PatternLayout
+ *  log4j.appender.audit.layout.ConversionPattern=%m%n
+ */
+@InterfaceAudience.Internal
+public class Auditor {
+  private static final Logger LOGGER = Logger.getLogger("carbon.audit");
+  private static String username;
+
+  static {
+try {
+  username = UserGroupInformation.getCurrentUser().getShortUserName();
+} catch (IOException e) {
+  username = "unknown";
+}
+  }
+
+  public static void logOperationStart(String opName, String opId) {
+OpStartMessage message = new OpStartMessage(opName, opId);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  public static void logOperationSuccess(String opName, String opId, 
String table,
+  String opTime, Map extraInfo) {
+OpEndMessage message = new OpEndMessage(opName, opId, table, opTime,
+OpStatus.SUCCESS, extraInfo);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  public static void logOperationFailed(String opName, String opId, String 
table,
+  String opTime, Map extraInfo) {
+OpEndMessage message = new OpEndMessage(opName, opId, table, opTime,
+OpStatus.FAILED, extraInfo);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  private enum OpStatus {
+// operation started
+START,
+
+// operation succeed
+SUCCESS,
+
+// operation failed
+FAILED
+  }
+
+  // log message for operation start, it is written as a JSON record in 
the audit log
+  private static class OpStartMessage {
+private String time;
+private String username;
+private String opName;
+private String opId;
+private OpStatus opStatus;
+
+OpStartMessage(String opName, String opId) {
+  SimpleDateFormat format = new SimpleDateFormat("-MM-dd 
HH:mm:ss");
--- End diff --

ok, will fix


---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230031744
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java ---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.impl.AuditLevel;
+
+import com.google.gson.Gson;
+import com.google.gson.GsonBuilder;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.log4j.Logger;
+
+/**
+ * Audit logger.
+ * User can configure log4j to log to a separate file. For example
+ *
+ *  log4j.logger.carbon.audit=DEBUG, audit
+ *  log4j.appender.audit=org.apache.log4j.FileAppender
+ *  log4j.appender.audit.File=/opt/logs/audit.out
+ *  log4j.appender.audit.Threshold=AUDIT
+ *  log4j.appender.audit.Append=false
+ *  log4j.appender.audit.layout=org.apache.log4j.PatternLayout
+ *  log4j.appender.audit.layout.ConversionPattern=%m%n
+ */
+@InterfaceAudience.Internal
+public class Auditor {
+  private static final Logger LOGGER = Logger.getLogger("carbon.audit");
+  private static String username;
+
+  static {
+try {
+  username = UserGroupInformation.getCurrentUser().getShortUserName();
+} catch (IOException e) {
+  username = "unknown";
+}
+  }
+
+  public static void logOperationStart(String opName, String opId) {
+OpStartMessage message = new OpStartMessage(opName, opId);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  public static void logOperationSuccess(String opName, String opId, 
String table,
+  String opTime, Map extraInfo) {
+OpEndMessage message = new OpEndMessage(opName, opId, table, opTime,
+OpStatus.SUCCESS, extraInfo);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
--- End diff --

ok, seems GSON is thread safe, so I change it to static member


---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230030708
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java ---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.impl.AuditLevel;
+
+import com.google.gson.Gson;
+import com.google.gson.GsonBuilder;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.log4j.Logger;
+
+/**
+ * Audit logger.
+ * User can configure log4j to log to a separate file. For example
+ *
+ *  log4j.logger.carbon.audit=DEBUG, audit
+ *  log4j.appender.audit=org.apache.log4j.FileAppender
+ *  log4j.appender.audit.File=/opt/logs/audit.out
+ *  log4j.appender.audit.Threshold=AUDIT
+ *  log4j.appender.audit.Append=false
+ *  log4j.appender.audit.layout=org.apache.log4j.PatternLayout
+ *  log4j.appender.audit.layout.ConversionPattern=%m%n
+ */
+@InterfaceAudience.Internal
+public class Auditor {
+  private static final Logger LOGGER = Logger.getLogger("carbon.audit");
+  private static String username;
+
+  static {
+try {
+  username = UserGroupInformation.getCurrentUser().getShortUserName();
+} catch (IOException e) {
+  username = "unknown";
+}
+  }
+
+  public static void logOperationStart(String opName, String opId) {
+OpStartMessage message = new OpStartMessage(opName, opId);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
--- End diff --

I think gson will keep the quota string.
In gson document, it says "By default, Gson escapes HTML characters such as 
< > etc. Use this option to configure Gson to pass-through HTML characters as 
is."


---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230030119
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java ---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.logging.impl.AuditLevel;
+
+import com.google.gson.Gson;
+import com.google.gson.GsonBuilder;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.log4j.Logger;
+
+/**
+ * Audit logger.
+ * User can configure log4j to log to a separate file. For example
+ *
+ *  log4j.logger.carbon.audit=DEBUG, audit
+ *  log4j.appender.audit=org.apache.log4j.FileAppender
+ *  log4j.appender.audit.File=/opt/logs/audit.out
+ *  log4j.appender.audit.Threshold=AUDIT
+ *  log4j.appender.audit.Append=false
+ *  log4j.appender.audit.layout=org.apache.log4j.PatternLayout
+ *  log4j.appender.audit.layout.ConversionPattern=%m%n
+ */
+@InterfaceAudience.Internal
+public class Auditor {
+  private static final Logger LOGGER = Logger.getLogger("carbon.audit");
+  private static String username;
+
+  static {
+try {
+  username = UserGroupInformation.getCurrentUser().getShortUserName();
+} catch (IOException e) {
+  username = "unknown";
+}
+  }
+
+  public static void logOperationStart(String opName, String opId) {
+OpStartMessage message = new OpStartMessage(opName, opId);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  public static void logOperationSuccess(String opName, String opId, 
String table,
+  String opTime, Map extraInfo) {
+OpEndMessage message = new OpEndMessage(opName, opId, table, opTime,
+OpStatus.SUCCESS, extraInfo);
+Gson gson = new GsonBuilder().disableHtmlEscaping().create();
+String json = gson.toJson(message);
+LOGGER.log(AuditLevel.AUDIT, json);
+  }
+
+  public static void logOperationFailed(String opName, String opId, String 
table,
--- End diff --

ok, changed to logOperationEnd


---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230028786
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/partition/CarbonAlterTableAddHivePartitionCommand.scala
 ---
@@ -55,6 +55,8 @@ case class CarbonAlterTableAddHivePartitionCommand(
 
   override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
 table = CarbonEnv.getCarbonTable(tableName)(sparkSession)
+setAuditTable(table)
+setAuditInfo(Map("partition" -> partitionSpecsAndLocs.mkString(",")))
--- End diff --

ok


---


[GitHub] carbondata pull request #2885: [CARBONDATA-3064] Support separate audit log

2018-11-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2885#discussion_r230028618
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/package.scala
 ---
@@ -60,21 +63,71 @@ trait DataProcessOperation {
   def processData(sparkSession: SparkSession): Seq[Row]
 }
 
+/**
+ * An utility that run the command with audit log
+ */
+trait Auditable {
+  // operation id that will be written in audit log
+  private val operationId: String = String.valueOf(System.nanoTime())
+
+  // extra info to be written in audit log, set by subclass of 
AtomicRunnableCommand
+  private var auditInfo: java.util.Map[String, String] = new 
java.util.HashMap[String, String]()
+
+  // holds the dbName and tableName for which this command is executed for
+  // used for audit log, set by subclass of AtomicRunnableCommand
+  private var table: String = _
+
+  // implement by subclass, return the operation name that record in audit 
log
+  protected def opName: String
+
+  protected def opTime(startTime: Long) = s"${(System.nanoTime() - 
startTime) / 1000L / 1000L} ms"
--- End diff --

ok


---


  1   2   >