[jira] [Commented] (CARBONDATA-276) Add trim option

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625774#comment-15625774
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957803
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -472,6 +475,7 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+<<< HEAD
--- End diff --

is this file is having any conflict?


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625769#comment-15625769
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957411
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -1694,5 +1699,19 @@ public void setTableOption(String tableOption) {
   public TableOptionWrapper getTableOptionWrapper() {
 return tableOptionWrapper;
   }
+
+  public String getIsUseTrim() {
+return isUseTrim;
+  }
+
+  public void setIsUseTrim(Boolean[] isUseTrim) {
+for (Boolean flag: isUseTrim) {
+  if (flag) {
+this.isUseTrim += "T";
--- End diff --

Use  TRUE/FALSE for better readability


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569112#comment-15569112
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r83039531
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala
 ---
@@ -0,0 +1,78 @@
+package org.apache.carbondata.spark.testsuite.dataload
+
+import java.io.File
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.sql.Row
+
+/**
+  * Created by x00381807 on 2016/9/26.
--- End diff --

Oh, my fault


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568332#comment-15568332
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

pros of this approach will be suppose in one load user loaded with dirty 
data and suddenly he realizes no i need to trim then in the next load he will 
enable the option and load the data, this will increase the dictionary space 
also, also in query dictionary lookup overhead will increase.


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568196#comment-15568196
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968804
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Also one more point it will be better to set this property in column level 
while creating the table itself as its column properties , this will avoid user 
to provide this option every time while data loading


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565038#comment-15565038
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82758567
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Same you need to handle in CarbonFilters.scala also, since while processing 
filter expressions the spaces are not getting trimmed, so here also you need to 
take care based on your feature


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-09-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535115#comment-15535115
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281079
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

Not clear about the usecase, can you make it more clear by providing more 
details


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-09-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535114#comment-15535114
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281050
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

why we need to set this in schemamodel?


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)