[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625774#comment-15625774 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957803 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -472,6 +475,7 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +<<< HEAD --- End diff -- is this file is having any conflict? > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625769#comment-15625769 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957411 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -1694,5 +1699,19 @@ public void setTableOption(String tableOption) { public TableOptionWrapper getTableOptionWrapper() { return tableOptionWrapper; } + + public String getIsUseTrim() { +return isUseTrim; + } + + public void setIsUseTrim(Boolean[] isUseTrim) { +for (Boolean flag: isUseTrim) { + if (flag) { +this.isUseTrim += "T"; --- End diff -- Use TRUE/FALSE for better readability > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569112#comment-15569112 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r83039531 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala --- @@ -0,0 +1,78 @@ +package org.apache.carbondata.spark.testsuite.dataload + +import java.io.File + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import org.apache.spark.sql.Row + +/** + * Created by x00381807 on 2016/9/26. --- End diff -- Oh, my fault > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568332#comment-15568332 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- pros of this approach will be suppose in one load user loaded with dirty data and suddenly he realizes no i need to trim then in the next load he will enable the option and load the data, this will increase the dictionary space also, also in query dictionary lookup overhead will increase. > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568196#comment-15568196 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968804 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- Also one more point it will be better to set this property in column level while creating the table itself as its column properties , this will avoid user to provide this option every time while data loading > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565038#comment-15565038 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82758567 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- Same you need to handle in CarbonFilters.scala also, since while processing filter expressions the spaces are not getting trimmed, so here also you need to take care based on your feature > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535115#comment-15535115 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281079 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java --- @@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus schmaModel, SchemaInfo info model.setEscapeCharacter(schmaModel.getEscapeCharacter()); model.setQuoteCharacter(schmaModel.getQuoteCharacter()); model.setCommentCharacter(schmaModel.getCommentCharacter()); +model.setTrim(schmaModel.getTrim()); --- End diff -- Not clear about the usecase, can you make it more clear by providing more details > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535114#comment-15535114 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281050 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java --- @@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus schmaModel, SchemaInfo info model.setEscapeCharacter(schmaModel.getEscapeCharacter()); model.setQuoteCharacter(schmaModel.getQuoteCharacter()); model.setCommentCharacter(schmaModel.getCommentCharacter()); +model.setTrim(schmaModel.getTrim()); --- End diff -- why we need to set this in schemamodel? > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)