[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute
[ https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627358#comment-15627358 ] ASF GitHub Bot commented on CARBONDATA-308: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058188 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java --- @@ -22,28 +22,44 @@ import java.io.DataOutput; import java.io.IOException; import java.io.Serializable; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.Distributable; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.path.CarbonTablePath; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapreduce.lib.input.FileSplit; + /** * Carbon input split to allow distributed read of CarbonInputFormat. */ -public class CarbonInputSplit extends FileSplit implements Serializable, Writable { +public class CarbonInputSplit extends FileSplit implements Distributable, Serializable, Writable { private static final long serialVersionUID = 3520344046772190207L; private String segmentId; - /** + public String taskId = "0"; + + /* * Number of BlockLets in a block */ private int numberOfBlocklets = 0; - public CarbonInputSplit() { -super(null, 0, 0, new String[0]); + public CarbonInputSplit() { } - public CarbonInputSplit(String segmentId, Path path, long start, long length, + private void parserPath(Path path) { +String[] nameParts = path.getName().split("-"); +if (nameParts != null && nameParts.length >= 3) { + this.taskId = nameParts[2]; +} + } + + private CarbonInputSplit(String segmentId, Path path, long start, long length, --- End diff -- please initialize taskId > Use CarbonInputFormat in CarbonScanRDD compute > -- > > Key: CARBONDATA-308 > URL: https://issues.apache.org/jira/browse/CARBONDATA-308 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > Take CarbonScanRDD as the target RDD, modify as following: > 1. In driver side, only getSplit is required, so only filter condition is > required, no need to create full QueryModel object, so we can move creation > of QueryModel from driver side to executor side. > 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead > of use QueryExecutor directly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-353) Update doc for dateformat option
[ https://issues.apache.org/jira/browse/CARBONDATA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627280#comment-15627280 ] ASF GitHub Bot commented on CARBONDATA-353: --- Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/272#discussion_r86058866 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -91,12 +91,17 @@ Following are the options that can be used in load data: ```ruby OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary') ``` -- **COLUMNDICT:** dictionary file path for single column. +- **COLUMNDICT:** Dictionary file path for each column. ```ruby OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2') ``` Note: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together. +- **DATEFORMAT:** Date format for each column. + +```ruby +OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2') --- End diff -- I add a note, ref to the JAVA SimpleDateFormat Class Doc. It provides more details. > Update doc for dateformat option > > > Key: CARBONDATA-353 > URL: https://issues.apache.org/jira/browse/CARBONDATA-353 > Project: CarbonData > Issue Type: Improvement >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Update doc for dateformat option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-355) Remove unnecessary method argument columnIdentifier of PathService.getCarbonTablePath
He Xiaoqiao created CARBONDATA-355: -- Summary: Remove unnecessary method argument columnIdentifier of PathService.getCarbonTablePath Key: CARBONDATA-355 URL: https://issues.apache.org/jira/browse/CARBONDATA-355 Project: CarbonData Issue Type: Improvement Components: core Affects Versions: 0.2.0-incubating Reporter: He Xiaoqiao Assignee: He Xiaoqiao Priority: Minor Remove one of method arguments of PathService#getCarbonTablePath since it is not necessary pass columnIdentifier when get table path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-353) Update doc for dateformat option
[ https://issues.apache.org/jira/browse/CARBONDATA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625795#comment-15625795 ] ASF GitHub Bot commented on CARBONDATA-353: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/272#discussion_r85959032 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -91,12 +91,17 @@ Following are the options that can be used in load data: ```ruby OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary') ``` -- **COLUMNDICT:** dictionary file path for single column. +- **COLUMNDICT:** Dictionary file path for each column. ```ruby OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2') ``` Note: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together. +- **DATEFORMAT:** Date format for each column. + +```ruby +OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2') --- End diff -- give an example of the data format > Update doc for dateformat option > > > Key: CARBONDATA-353 > URL: https://issues.apache.org/jira/browse/CARBONDATA-353 > Project: CarbonData > Issue Type: Improvement >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Update doc for dateformat option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625774#comment-15625774 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957803 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -472,6 +475,7 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +<<< HEAD --- End diff -- is this file is having any conflict? > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-276) Add trim option
[ https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625769#comment-15625769 ] ASF GitHub Bot commented on CARBONDATA-276: --- Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957411 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -1694,5 +1699,19 @@ public void setTableOption(String tableOption) { public TableOptionWrapper getTableOptionWrapper() { return tableOptionWrapper; } + + public String getIsUseTrim() { +return isUseTrim; + } + + public void setIsUseTrim(Boolean[] isUseTrim) { +for (Boolean flag: isUseTrim) { + if (flag) { +this.isUseTrim += "T"; --- End diff -- Use TRUE/FALSE for better readability > Add trim option > --- > > Key: CARBONDATA-276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-276 > Project: CarbonData > Issue Type: Bug >Reporter: Lionx >Assignee: Lionx >Priority: Minor > > Fix a bug and add trim option. > Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query > result is null. This is because the dictionary ignore the LeadingWhiteSpace > and TrailingWhiteSpace and the csvInput dose not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-354) Query execute successfully even not argument given in count function
Prabhat Kashyap created CARBONDATA-354: -- Summary: Query execute successfully even not argument given in count function Key: CARBONDATA-354 URL: https://issues.apache.org/jira/browse/CARBONDATA-354 Project: CarbonData Issue Type: Bug Reporter: Prabhat Kashyap Priority: Minor When I am executing following command: select count() from tableName; It gave me no error and execute successfully but it gives following exception when I execute the same in Hive: FAILED: UDFArgumentException Argument expected -- This message was sent by Atlassian JIRA (v6.3.4#6332)