[GitHub] carbondata pull request #1516: [CARBONDATA-1729]Fix the compatibility issue ...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1516#discussion_r151621845 --- Diff: pom.xml --- @@ -453,9 +453,9 @@ - hadoop-2.7.2 + hadoop-2.2.0 --- End diff -- you can add a profile for hadoop-2.2.0, don't need to overwrite hadoop-2.7.2. by default, should use hadoop-2.7.2 ---
[GitHub] carbondata issue #1519: [CARBONDATA-1753][Streaming]Fix missing 'org.scalate...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1519 Done ---
[jira] [Resolved] (CARBONDATA-1750) SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty
[ https://issues.apache.org/jira/browse/CARBONDATA-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-1750. -- Resolution: Fixed Assignee: QiangCai Fix Version/s: 1.3.0 > SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty > -- > > Key: CARBONDATA-1750 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1750 > Project: CarbonData > Issue Type: Bug >Reporter: QiangCai >Assignee: QiangCai >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1519: [CARBONDATA-1753]Fix missing 'org.scalatest.tools.Ru...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1519 LGTM ---
[GitHub] carbondata issue #1519: [CARBONDATA-1753]Fix missing 'org.scalatest.tools.Ru...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1519 @zzcclp can you change the title format , like : [CARBONDATA-1753][Streaming] Fix missing 'org.scalatest.tools.Runner' issue when run test with streaming module ---
[GitHub] carbondata pull request #1517: [CARBONDATA-1750] Fix NPE when tablestatus fi...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1517 ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1215/ ---
[GitHub] carbondata pull request #1522: [HOTFIX] change to use store path in property...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1522 ---
[GitHub] carbondata issue #1522: [HOTFIX] change to use store path in property in tes...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1522 LGTM ---
[GitHub] carbondata pull request #1522: [HOTFIX] change to use store path in property...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/1522 [HOTFIX] change to use store path in property in testcase Change to use store path in property in testcase - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done No test case is added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata patch-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1522 commit 7668cab9530d54bfc2ed46c3d3011d65d21a Author: Jacky Li Date: 2017-11-17T07:45:26Z [HOTFIX] change to use store path in property in testcase change to use store path in property in testcase ---
[GitHub] carbondata pull request #1521: [WIP] [CARBONDATA-1743] fix conurrent pre-agg...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/1521 [WIP] [CARBONDATA-1743] fix conurrent pre-agg creation and query Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata concurrent_query Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1521.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1521 commit 8952423e18a0c36afe08c5ead3730ff48fc3661e Author: kunal642 Date: 2017-11-17T06:43:25Z fix conurrent pre-agg creation and query ---
[GitHub] carbondata issue #1519: [CARBONDATA-1753]Fix missing 'org.scalatest.tools.Ru...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1519 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1214/ ---
[GitHub] carbondata pull request #1520: [CARBONDATA-1734] Ignore empty line while rea...
GitHub user dhatchayani opened a pull request: https://github.com/apache/carbondata/pull/1520 [CARBONDATA-1734] Ignore empty line while reading CSV Ignore / Skip empty lines while loading Load level and System level properties are added to control it. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [X] Document update required? New System level property and load level property is added. Document should be updated accordingly. - [X] Testing done UT Added - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/incubator-carbondata empty_line Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1520 commit 41b0a259ba94509245383a23029b6a6ce2366760 Author: dhatchayani Date: 2017-11-17T07:11:49Z [CARBONDATA-1734] Ignore empty line while reading CSV ---
[jira] [Updated] (CARBONDATA-1726) Carbon1.3.0-Streaming - Select query from spark-shell does not execute successfully for streaming table load
[ https://issues.apache.org/jira/browse/CARBONDATA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1726: Description: Steps : // prepare csv file for batch loading cd /srv/spark2.2Bigdata/install/hadoop/datanode/bin // generate streamSample.csv 10001,batch_1,city_1,0.1,school_1:school_11$20 10002,batch_2,city_2,0.2,school_2:school_22$30 10003,batch_3,city_3,0.3,school_3:school_33$40 10004,batch_4,city_4,0.4,school_4:school_44$50 10005,batch_5,city_5,0.5,school_5:school_55$60 // put to hdfs /tmp/streamSample.csv ./hadoop fs -put streamSample.csv /tmp // spark-beeline cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" bin/beeline -u jdbc:hive2://10.18.98.34:23040 CREATE TABLE stream_table( id INT, name STRING, city STRING, salary FLOAT ) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', 'sort_columns'='name'); LOAD DATA LOCAL INPATH 'hdfs://hacluster/chetan/streamSample.csv' INTO TABLE stream_table OPTIONS('HEADER'='false'); // spark-shell cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-shell --master yarn-client import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). config("spark.sql.warehouse.dir", "hdfs://hacluster/user/sparkhive/warehouse"). config("javax.jdo.option.ConnectionURL", "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). config("javax.jdo.option.ConnectionPassword", "huawei"). config("javax.jdo.option.ConnectionUserName", "sparksql"). getOrCreateCarbonSession() carbonSession.sparkContext.setLogLevel("ERROR") carbonSession.sql("select * from stream_table").show *Issue : Select query from spark-shell does not execute successfully for streaming table load.* When the executor and driver cores and memory is increased while launching the spark shell the issue still occurs. bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 scala> import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.constants.CarbonCommonConstants scala> import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.CarbonProperties scala> import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} scala> scala> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") res29: org.apache.carbondata.core.util.CarbonProperties = org.apache.carbondata.core.util.CarbonProperties@67b056e7 scala> scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> scala> val carbonSession = SparkSession. | builder(). | appName("StreamExample"). | config("spark.sql.warehouse.dir", "hdfs://hacluster/user/sparkhive/warehouse"). | config("javax.jdo.option.ConnectionURL", "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). | config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). | config("javax.jdo.option.ConnectionPassword", "huawei"). | config("javax.jdo.option.ConnectionUserName", "sparksql"). | getOrCreateCarbonSession() carbonSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@1d0590bc scala> | carbonSession.sparkContext.setLogLevel("ERROR") scala> carbonSession.sql("select * from stream_table").show org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 65, BLR114269, executor 8): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(Obje
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1516 I have complied successfully with below commands: 1. mvn -Pspark-2.1 -Pbuild-with-format -Dspark.version=2.1.2 clean package; 2. mvn -Pspark-2.1 -Phadoop-2.2.0 -Pbuild-with-format -Dspark.version=2.1.2 -Dhadoop.version=2.6.0-cdh5.7.1; @QiangCai @jackylk @chenliang613 please review, thanks. ---
[jira] [Assigned] (CARBONDATA-1734) Ignore empty line while reading CSV
[ https://issues.apache.org/jira/browse/CARBONDATA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani reassigned CARBONDATA-1734: --- Assignee: dhatchayani (was: Akash R Nilugal) > Ignore empty line while reading CSV > --- > > Key: CARBONDATA-1734 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1734 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Ignore empty line while reading CSV file in LOAD -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1471#discussion_r151615805 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/AbstractDataMapWriter.java --- @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datamap.dev; + +import java.io.IOException; + +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; + +/** + * Data Map writer + */ +public abstract class AbstractDataMapWriter { + + protected AbsoluteTableIdentifier identifier; + + protected String segmentId; + + protected String writeDirectoryPath; + + public AbstractDataMapWriter(AbsoluteTableIdentifier identifier, String segmentId, + String writeDirectoryPath) { +this.identifier = identifier; +this.segmentId = segmentId; +this.writeDirectoryPath = writeDirectoryPath; + } + + /** + * Start of new block notification. + * + * @param blockId file name of the carbondata file + */ + public abstract void onBlockStart(String blockId); + + /** + * End of block notification + */ + public abstract void onBlockEnd(String blockId); + + /** + * Start of new blocklet notification. + * + * @param blockletId sequence number of blocklet in the block + */ + public abstract void onBlockletStart(int blockletId); + + /** + * End of blocklet notification + * + * @param blockletId sequence number of blocklet in the block + */ + public abstract void onBlockletEnd(int blockletId); + + /** + * Add the column pages row to the datamap, order of pages is same as `indexColumns` in + * DataMapMeta returned in DataMapFactory. + * Implementation should copy the content of `pages` as needed, because `pages` memory + * may be freed after this method returns, if using unsafe column page. + */ + public abstract void onPageAdded(int blockletId, int pageId, ColumnPage[] pages); + + /** + * This is called during closing of writer.So after this call no more data will be sent to this + * class. + */ + public abstract void finish(); + + /** + * It copies the file from temp folder to actual folder + * + * @param dataMapFile + * @throws IOException + */ + protected void commitFile(String dataMapFile) throws IOException { --- End diff -- What if anything failed inside this function, who will catch IOException and handle it? ---
[jira] [Updated] (CARBONDATA-1726) Carbon1.3.0-Streaming - Select query from spark-shell does not execute successfully for streaming table load
[ https://issues.apache.org/jira/browse/CARBONDATA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1726: Description: Steps : // prepare csv file for batch loading cd /srv/spark2.2Bigdata/install/hadoop/datanode/bin // generate streamSample.csv 10001,batch_1,city_1,0.1,school_1:school_11$20 10002,batch_2,city_2,0.2,school_2:school_22$30 10003,batch_3,city_3,0.3,school_3:school_33$40 10004,batch_4,city_4,0.4,school_4:school_44$50 10005,batch_5,city_5,0.5,school_5:school_55$60 // put to hdfs /tmp/streamSample.csv ./hadoop fs -put streamSample.csv /tmp // spark-beeline cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" bin/beeline -u jdbc:hive2://10.18.98.34:23040 CREATE TABLE stream_table( id INT, name STRING, city STRING, salary FLOAT ) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', 'sort_columns'='name'); LOAD DATA LOCAL INPATH 'hdfs://hacluster/chetan/streamSample.csv' INTO TABLE stream_table OPTIONS('HEADER'='false'); // spark-shell cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-shell --master yarn-client import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). config("spark.sql.warehouse.dir", "hdfs://hacluster/user/sparkhive/warehouse"). config("javax.jdo.option.ConnectionURL", "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). config("javax.jdo.option.ConnectionPassword", "huawei"). config("javax.jdo.option.ConnectionUserName", "sparksql"). getOrCreateCarbonSession() carbonSession.sparkContext.setLogLevel("ERROR") carbonSession.sql("select * from stream_table").show Issue : Select query from spark-shell does not execute successfully for streaming table load. When the executor and driver cores and memory is increased while launching the spark shell the issue still occurs. bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 scala> import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.constants.CarbonCommonConstants scala> import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.CarbonProperties scala> import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} scala> scala> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") res29: org.apache.carbondata.core.util.CarbonProperties = org.apache.carbondata.core.util.CarbonProperties@67b056e7 scala> scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> scala> val carbonSession = SparkSession. | builder(). | appName("StreamExample"). | config("spark.sql.warehouse.dir", "hdfs://hacluster/user/sparkhive/warehouse"). | config("javax.jdo.option.ConnectionURL", "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). | config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). | config("javax.jdo.option.ConnectionPassword", "huawei"). | config("javax.jdo.option.ConnectionUserName", "sparksql"). | getOrCreateCarbonSession() carbonSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@1d0590bc scala> | carbonSession.sparkContext.setLogLevel("ERROR") scala> carbonSession.sql("select * from stream_table").show org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 65, BLR114269, executor 8): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(Object
[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1471#discussion_r151615573 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -755,7 +758,8 @@ private CarbonInputSplit convertToCarbonInputSplit(ExtendedBlocklet blocklet) org.apache.carbondata.hadoop.CarbonInputSplit.from(blocklet.getSegmentId(), new FileSplit(new Path(blocklet.getPath()), 0, blocklet.getLength(), blocklet.getLocations()), -ColumnarFormatVersion.valueOf((short) blocklet.getDetailInfo().getVersionNumber())); +ColumnarFormatVersion.valueOf((short) blocklet.getDetailInfo().getVersionNumber()), +blocklet.getDataMapWriterPath()); --- End diff -- indentation not correct ---
[jira] [Updated] (CARBONDATA-1726) Carbon1.3.0-Streaming - Select query from spark-shell does not execute successfully for streaming table load
[ https://issues.apache.org/jira/browse/CARBONDATA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1726: Description: Steps : // prepare csv file for batch loading cd /srv/spark2.2Bigdata/install/hadoop/datanode/bin // generate streamSample.csv 10001,batch_1,city_1,0.1,school_1:school_11$20 10002,batch_2,city_2,0.2,school_2:school_22$30 10003,batch_3,city_3,0.3,school_3:school_33$40 10004,batch_4,city_4,0.4,school_4:school_44$50 10005,batch_5,city_5,0.5,school_5:school_55$60 // put to hdfs /tmp/streamSample.csv ./hadoop fs -put streamSample.csv /tmp // spark-beeline cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" bin/beeline -u jdbc:hive2://10.18.98.34:23040 CREATE TABLE stream_table( id INT, name STRING, city STRING, salary FLOAT ) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', 'sort_columns'='name'); LOAD DATA LOCAL INPATH 'hdfs://hacluster/chetan/streamSample.csv' INTO TABLE stream_table OPTIONS('HEADER'='false'); // spark-shell cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-shell --master yarn-client import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). config("spark.sql.warehouse.dir", "hdfs://hacluster/user/sparkhive/warehouse"). config("javax.jdo.option.ConnectionURL", "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). config("javax.jdo.option.ConnectionPassword", "huawei"). config("javax.jdo.option.ConnectionUserName", "sparksql"). getOrCreateCarbonSession() carbonSession.sparkContext.setLogLevel("ERROR") carbonSession.sql("select * from stream_table").show Issue : Select query from spark-shell does not execute successfully for streaming table load. When the executor and driver cores and memory is increased while launching the spark shell the issue still occurs. bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 scala> carbonSession.sql("select * from stream_table").show org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 65, BLR114269, executor 8): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAG
[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1471#discussion_r151615263 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -574,7 +482,9 @@ private CopyThread(String fileName) { * @throws Exception if unable to compute a result */ @Override public Void call() throws Exception { - copyCarbonDataFileToCarbonStorePath(fileName); + CarbonUtil.copyCarbonDataFileToCarbonStorePath(fileName, --- End diff -- move parameter to next line ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1516 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1213/ ---
[GitHub] carbondata issue #1515: [CARBONDATA-1751] Modify sys.err to AnalysisExceptio...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1515 Please review it @jackylk @QiangCai ---
[GitHub] carbondata issue #1517: [CARBONDATA-1750] Fix NPE when tablestatus file is e...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1517 LGTM ---
[jira] [Resolved] (CARBONDATA-1326) Fixed high priority findbug issues
[ https://issues.apache.org/jira/browse/CARBONDATA-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-1326. -- Resolution: Fixed Fix Version/s: 1.3.0 > Fixed high priority findbug issues > -- > > Key: CARBONDATA-1326 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1326 > Project: CarbonData > Issue Type: Bug >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 23h 50m > Remaining Estimate: 0h > > Currently there are lot if find bug issues in the carbondata code. These need > to be priortized and fixed. So through this jira all high priority findbug > issues are addressed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1507: [CARBONDATA-1326] Fixed high priority findbug...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1507 ---
[GitHub] carbondata issue #1507: [CARBONDATA-1326] Fixed high priority findbug issue
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1507 LGTM ---
[GitHub] carbondata pull request #1509: [CARBONDATA-1739] Clean up store path interfa...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1509 ---
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1509 LGTM ---
[GitHub] carbondata issue #1491: [CARBONDATA-1651] [Supported Boolean Type When Savin...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1491 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1212/ ---
[GitHub] carbondata pull request #1500: [CARBONDATA-1717]Remove spark broadcast for g...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1500 ---
[GitHub] carbondata pull request #1519: [CARBONDATA-1753]Fix missing 'org.scalatest.t...
GitHub user zzcclp opened a pull request: https://github.com/apache/carbondata/pull/1519 [CARBONDATA-1753]Fix missing 'org.scalatest.tools.Runner' issue when run test with streaming module Fix missing 'org.scalatest.tools.Runner' issue when run test with streaming module Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zzcclp/carbondata CARBONDATA-1753 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1519 commit 452d442e2a2779b5bb870a492b3b6ec7994b4161 Author: Zhang Zhichao <441586...@qq.com> Date: 2017-11-17T06:27:23Z [CARBONDATA-1753]Fix missing 'org.scalatest.tools.Runner' issue when run test with streaming module Fix missing 'org.scalatest.tools.Runner' issue when run test with streaming module ---
[GitHub] carbondata issue #1500: [CARBONDATA-1717]Remove spark broadcast for gettting...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1500 LGTM ---
[jira] [Created] (CARBONDATA-1753) Missing 'org.scalatest.tools.Runner' when run test with streaming module
Zhichao Zhang created CARBONDATA-1753: -- Summary: Missing 'org.scalatest.tools.Runner' when run test with streaming module Key: CARBONDATA-1753 URL: https://issues.apache.org/jira/browse/CARBONDATA-1753 Project: CarbonData Issue Type: Bug Components: build Affects Versions: 1.3.0 Reporter: Zhichao Zhang Assignee: Zhichao Zhang Priority: Minor Fix For: 1.3.0 Missing 'org.scalatest.tools.Runner' when run test with streaming module. Need to add scalatest to pom.xml of streaming module. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1507: [CARBONDATA-1326] Fixed high priority findbug issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1507 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1211/ ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1210/ ---
[GitHub] carbondata issue #1515: [CARBONDATA-1751] Modify sys.err to AnalysisExceptio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1515 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1209/ ---
[jira] [Updated] (CARBONDATA-1713) Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table when upper case used for column name
[ https://issues.apache.org/jira/browse/CARBONDATA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna S updated CARBONDATA-1713: -- Summary: Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table when upper case used for column name (was: Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table) > Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after > creating pre-aggregate table when upper case used for column name > > > Key: CARBONDATA-1713 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1713 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: ANT Test cluster - 3 node >Reporter: Ramakrishna S >Assignee: kumar vishal >Priority: Minor > Labels: Functional, sanity > Fix For: 1.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or > view 'lineitem' not found in database 'default'; (state=,code=0) > 0: jdbc:hive2://10.18.98.34:23040> create table if not exists lineitem( > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPMODE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPINSTRUCT string, > 0: jdbc:hive2://10.18.98.34:23040> L_RETURNFLAG string, > 0: jdbc:hive2://10.18.98.34:23040> L_RECEIPTDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_ORDERKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_PARTKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_SUPPKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_LINENUMBER int, > 0: jdbc:hive2://10.18.98.34:23040> L_QUANTITY double, > 0: jdbc:hive2://10.18.98.34:23040> L_EXTENDEDPRICE double, > 0: jdbc:hive2://10.18.98.34:23040> L_DISCOUNT double, > 0: jdbc:hive2://10.18.98.34:23040> L_TAX double, > 0: jdbc:hive2://10.18.98.34:23040> L_LINESTATUS string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMITDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMENT string > 0: jdbc:hive2://10.18.98.34:23040> ) STORED BY 'org.apache.carbondata.format' > 0: jdbc:hive2://10.18.98.34:23040> TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.338 seconds) > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (48.634 seconds) > 0: jdbc:hive2://10.18.98.34:23040> create datamap agr_lineitem ON TABLE > lineitem USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as > select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from > lineitem group by L_RETURNFLAG, L_LINESTATUS; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (16.552 seconds) > 0: jdbc:hive2://10.18.98.34:23040> select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem > group by L_RETURNFLAG, L_LINESTATUS; > Error: org.apache.spark.sql.AnalysisException: Column doesnot exists in Pre > Aggregate table; (state=,code=0) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CARBONDATA-1713) Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253213#comment-16253213 ] Ramakrishna S edited comment on CARBONDATA-1713 at 11/17/17 5:05 AM: - Changing severity based on the clarification provided, will use lower case for query till this issue is fixed. was (Author: ram@huawei): Changing severity based on the clarification given. > Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after > creating pre-aggregate table > --- > > Key: CARBONDATA-1713 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1713 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: ANT Test cluster - 3 node >Reporter: Ramakrishna S >Assignee: kumar vishal >Priority: Minor > Labels: Functional, sanity > Fix For: 1.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or > view 'lineitem' not found in database 'default'; (state=,code=0) > 0: jdbc:hive2://10.18.98.34:23040> create table if not exists lineitem( > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPMODE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPINSTRUCT string, > 0: jdbc:hive2://10.18.98.34:23040> L_RETURNFLAG string, > 0: jdbc:hive2://10.18.98.34:23040> L_RECEIPTDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_ORDERKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_PARTKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_SUPPKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_LINENUMBER int, > 0: jdbc:hive2://10.18.98.34:23040> L_QUANTITY double, > 0: jdbc:hive2://10.18.98.34:23040> L_EXTENDEDPRICE double, > 0: jdbc:hive2://10.18.98.34:23040> L_DISCOUNT double, > 0: jdbc:hive2://10.18.98.34:23040> L_TAX double, > 0: jdbc:hive2://10.18.98.34:23040> L_LINESTATUS string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMITDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMENT string > 0: jdbc:hive2://10.18.98.34:23040> ) STORED BY 'org.apache.carbondata.format' > 0: jdbc:hive2://10.18.98.34:23040> TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.338 seconds) > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (48.634 seconds) > 0: jdbc:hive2://10.18.98.34:23040> create datamap agr_lineitem ON TABLE > lineitem USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as > select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from > lineitem group by L_RETURNFLAG, L_LINESTATUS; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (16.552 seconds) > 0: jdbc:hive2://10.18.98.34:23040> select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem > group by L_RETURNFLAG, L_LINESTATUS; > Error: org.apache.spark.sql.AnalysisException: Column doesnot exists in Pre > Aggregate table; (state=,code=0) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1726) Carbon1.3.0-Streaming - Select query from spark-shell does not execute successfully for streaming table load
[ https://issues.apache.org/jira/browse/CARBONDATA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1726: Description: Steps : // prepare csv file for batch loading cd /srv/spark2.2Bigdata/install/hadoop/datanode/bin // generate streamSample.csv 10001,batch_1,city_1,0.1,school_1:school_11$20 10002,batch_2,city_2,0.2,school_2:school_22$30 10003,batch_3,city_3,0.3,school_3:school_33$40 10004,batch_4,city_4,0.4,school_4:school_44$50 10005,batch_5,city_5,0.5,school_5:school_55$60 // put to hdfs /tmp/streamSample.csv ./hadoop fs -put streamSample.csv /tmp // spark-beeline cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" bin/beeline -u jdbc:hive2://10.18.98.34:23040 CREATE TABLE stream_table( id INT, name STRING, city STRING, salary FLOAT ) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', 'sort_columns'='name'); LOAD DATA LOCAL INPATH 'hdfs://hacluster/chetan/streamSample.csv' INTO TABLE stream_table OPTIONS('HEADER'='false'); // spark-shell cd /srv/spark2.2Bigdata/install/spark/sparkJdbc bin/spark-shell --master yarn-client import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). config("spark.sql.warehouse.dir", "hdfs://hacluster/user/sparkhive/warehouse"). config("javax.jdo.option.ConnectionURL", "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). config("javax.jdo.option.ConnectionPassword", "huawei"). config("javax.jdo.option.ConnectionUserName", "sparksql"). getOrCreateCarbonSession() carbonSession.sparkContext.setLogLevel("ERROR") carbonSession.sql("select * from stream_table").show Issue : Select query from spark-shell does not execute successfully for streaming table load. In AM logs for the failed attempt the below error is displayed. AM Container for appattempt_1510838225027_0014_01 exited with exitCode: 11 For more detailed output, check the application tracking page:http://BLR114278:45020/cluster/app/application_1510838225027_0014 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e06_1510838225027_0014_01_01 Exit code: 11 Stack trace: ExitCodeException exitCode=11: at org.apache.hadoop.util.Shell.runCommand(Shell.java:636) at org.apache.hadoop.util.Shell.run(Shell.java:533) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:829) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:224) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:313) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 11 and the last 4096 bytes from the error logs are : op.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:313) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 and the last 4096 bytes from the error logs are : Java HotSpot(TM) 64-Bit Server VM warning: Setting CompressedClassSpaceSize has no effect when compressed class pointers are not used | org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) 2017-11
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1509 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1208/ ---
[GitHub] carbondata issue #1491: [CARBONDATA-1651] [Supported Boolean Type When Savin...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1491 retest this please ---
[GitHub] carbondata issue #1518: [CARBONDATA-1752] There are some scalastyle error sh...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1518 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1207/ ---
[jira] [Updated] (CARBONDATA-1726) Carbon1.3.0-Streaming - Select query from spark-shell does not execute successfully for streaming table load
[ https://issues.apache.org/jira/browse/CARBONDATA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1726: Priority: Blocker (was: Major) > Carbon1.3.0-Streaming - Select query from spark-shell does not execute > successfully for streaming table load > > > Key: CARBONDATA-1726 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1726 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster SUSE 11 SP4 >Reporter: Chetan Bhat >Priority: Blocker > Labels: Functional > > Steps : > // prepare csv file for batch loading > cd /srv/spark2.2Bigdata/install/hadoop/datanode/bin > // generate streamSample.csv > 10001,batch_1,city_1,0.1,school_1:school_11$20 > 10002,batch_2,city_2,0.2,school_2:school_22$30 > 10003,batch_3,city_3,0.3,school_3:school_33$40 > 10004,batch_4,city_4,0.4,school_4:school_44$50 > 10005,batch_5,city_5,0.5,school_5:school_55$60 > // put to hdfs /tmp/streamSample.csv > ./hadoop fs -put streamSample.csv /tmp > // spark-beeline > cd /srv/spark2.2Bigdata/install/spark/sparkJdbc > bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores > 5 --driver-memory 5G --num-executors 3 --class > org.apache.carbondata.spark.thriftserver.CarbonThriftServer > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > "hdfs://hacluster/user/sparkhive/warehouse" > bin/beeline -u jdbc:hive2://10.18.98.34:23040 > CREATE TABLE stream_table( > id INT, > name STRING, > city STRING, > salary FLOAT > ) > STORED BY 'carbondata' > TBLPROPERTIES('streaming'='true', 'sort_columns'='name'); > LOAD DATA LOCAL INPATH 'hdfs://hacluster/chetan/streamSample.csv' INTO TABLE > stream_table OPTIONS('HEADER'='false'); > // spark-shell > cd /srv/spark2.2Bigdata/install/spark/sparkJdbc > bin/spark-shell --master yarn-client > import java.io.{File, PrintWriter} > import java.net.ServerSocket > import org.apache.spark.sql.{CarbonEnv, SparkSession} > import org.apache.spark.sql.hive.CarbonRelation > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, > "/MM/dd") > import org.apache.spark.sql.CarbonSession._ > val carbonSession = SparkSession. > builder(). > appName("StreamExample"). > config("spark.sql.warehouse.dir", > "hdfs://hacluster/user/sparkhive/warehouse"). > config("javax.jdo.option.ConnectionURL", > "jdbc:mysql://10.18.98.34:3306/sparksql?characterEncoding=UTF-8"). > config("javax.jdo.option.ConnectionDriverName", "com.mysql.jdbc.Driver"). > config("javax.jdo.option.ConnectionPassword", "huawei"). > config("javax.jdo.option.ConnectionUserName", "sparksql"). > getOrCreateCarbonSession() > > carbonSession.sparkContext.setLogLevel("ERROR") > carbonSession.sql("select * from stream_table").show > Issue : Select query from spark-shell does not execute successfully for > streaming table load. > Expected : Select query from spark-shell should execute successfully for > streaming table load. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1516 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1206/ ---
[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1471#discussion_r151598459 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/AbstractDataMapWriter.java --- @@ -0,0 +1,110 @@ +/* --- End diff -- Can you explain , why change "DataMapWriter.java" to "AbstractDataMapWriter.java", for easier supporting uses to customize other type of datamapwriter? ---
[GitHub] carbondata issue #1517: [CARBONDATA-1750] Fix NPE when tablestatus file is e...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1517 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1204/ ---
[GitHub] carbondata pull request #1518: [CARBONDATA-1752] There are some scalastyle e...
GitHub user xubo245 opened a pull request: https://github.com/apache/carbondata/pull/1518 [CARBONDATA-1752] There are some scalastyle error should be optimized in CarbonData Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done No - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata fixStyle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1518 commit 8a754e3527f5c2de47035d248b7ddb2d8181ff67 Author: xubo245 <601450...@qq.com> Date: 2017-11-17T03:25:14Z [CARBONDATA-1752] There are some scalastyle error should be optimized in CarbonData ---
[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1471#discussion_r151595971 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapMeta.java --- @@ -19,15 +19,15 @@ import java.util.List; -import org.apache.carbondata.core.indexstore.schema.FilterType; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; public class DataMapMeta { private List indexedColumns; - private FilterType optimizedOperation; + private List optimizedOperation; --- End diff -- in ExpressionType,no "like" expression. ---
[jira] [Created] (CARBONDATA-1752) There are some scalastyle error should be optimized in CarbonData
xubo245 created CARBONDATA-1752: --- Summary: There are some scalastyle error should be optimized in CarbonData Key: CARBONDATA-1752 URL: https://issues.apache.org/jira/browse/CARBONDATA-1752 Project: CarbonData Issue Type: Bug Components: file-format Affects Versions: 1.2.0 Reporter: xubo245 Assignee: xubo245 Priority: Minor Fix For: 1.3.0 There are some scalastyle error should be optimized in CarbonData, including removing useless import, optimizing method definition and so on -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1515: [CARBONDATA-1751] Modify sys.err to AnalysisExceptio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1515 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1203/ ---
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1509 @zzcclp because I implement those PR one by one, so it is on top of #1504 ---
[GitHub] carbondata pull request #1517: [CARBONDATA-1750] Fix NPE when tablestatus fi...
GitHub user QiangCai opened a pull request: https://github.com/apache/carbondata/pull/1517 [CARBONDATA-1750] Fix NPE when tablestatus file is empty - [x] Any interfaces changed? no - [x] Any backward compatibility impacted? no - [x] Document update required? no - [x] Testing done no - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/carbondata segmentstatus Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1517.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1517 commit 0333be2f614da59e0622a4b82d7c2fe9dccbf1b1 Author: QiangCai Date: 2017-11-17T02:45:13Z fix npe when tablestatus is empty ---
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1509 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1201/ ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1516 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1202/ ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1516 @QiangCai @jackylk please review, thanks. ---
[GitHub] carbondata pull request #1516: [CARBONDATA-1729]Fix the compatibility issue ...
GitHub user zzcclp opened a pull request: https://github.com/apache/carbondata/pull/1516 [CARBONDATA-1729]Fix the compatibility issue with hadoop <= 2.6 and 2.7 1. Recover profile of 'hadoop-2.2.0' to pom.xml 2. Use reflection mechanism to implement 'truncate' method Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zzcclp/carbondata CARBONDATA-1729 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1516.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1516 commit 66e349b277251ebfb46adc48a833569de32e1799 Author: Zhang Zhichao <441586...@qq.com> Date: 2017-11-17T02:29:12Z [CARBONDATA-1729]Fix the compatibility issue with hadoop <= 2.6 and 2.7 1. Recover profile of 'hadoop-2.2.0' to pom.xml 2. Use reflection mechanism to implement 'truncate' method ---
[GitHub] carbondata pull request #1515: [CARBONDATA-1751] Modify sys.err to AnalysisE...
GitHub user xubo245 opened a pull request: https://github.com/apache/carbondata/pull/1515 [CARBONDATA-1751] Modify sys.err to AnalysisException when uses run related operation except IUD,compaction and alter carbon printout improper error message, for example, it printout system error when users run create table with the same column name, but it should printout related exception information So we modify sys.error method to AnalysisException when uses run related operation except IUD,compaction and alter Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata fixSysError Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1515.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1515 commit 696f02d4ece90308e729d3d7fed222aa58b0e9c9 Author: xubo245 <601450...@qq.com> Date: 2017-11-17T02:48:31Z [CARBONDATA-1751] Modify sys.err to AnalysisException when uses run related operation except IUD,compaction and alter ---
[jira] [Created] (CARBONDATA-1751) Modify sys.err to AnalysisException when uses run related operation except IUD,compaction and alter
xubo245 created CARBONDATA-1751: --- Summary: Modify sys.err to AnalysisException when uses run related operation except IUD,compaction and alter Key: CARBONDATA-1751 URL: https://issues.apache.org/jira/browse/CARBONDATA-1751 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.2.0 Reporter: xubo245 Assignee: xubo245 Priority: Minor Fix For: 1.3.0 carbon printout improper error message, for example, it printout system error when users run create table with the same column name, but it should printout related exception information So we modify sys.error method to AnalysisException when uses run related operation except IUD,compaction and alter -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1742) Fix NullPointerException in SegmentStatusManager
[ https://issues.apache.org/jira/browse/CARBONDATA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256384#comment-16256384 ] xubo245 commented on CARBONDATA-1742: - It has been added into https://github.com/apache/carbondata/pull/1507/files > Fix NullPointerException in SegmentStatusManager > > > Key: CARBONDATA-1742 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1742 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 1.2.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > when loadFolderDetailsArray is null ,there is NullPointerException. We > should fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1501: [CARBONDATA-1713] Fixed Aggregate query on main tabl...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1501 Table name and column name are case sensitive, right? ---
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1509 I found the commit 'add s3 in filefactory' appears in many prs. ---
[jira] [Updated] (CARBONDATA-1729) The compatibility issue with hadoop <= 2.6 and 2.7
[ https://issues.apache.org/jira/browse/CARBONDATA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated CARBONDATA-1729: --- Summary: The compatibility issue with hadoop <= 2.6 and 2.7 (was: Fix the compatibility issue with hadoop <= 2.6 and 2.7) > The compatibility issue with hadoop <= 2.6 and 2.7 > -- > > Key: CARBONDATA-1729 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1729 > Project: CarbonData > Issue Type: Bug > Components: hadoop-integration >Affects Versions: 1.3.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang > Fix For: 1.3.0 > > > On branch master, when compiled with hadoop <= 2.6, it failed, the root cause > is using new API FileSystem.truncate which is added in hadoop 2.7. It needs > to implement a method called 'truncate' in file 'FileFactory.java' to support > hadoop <= 2.6. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1750) SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty
QiangCai created CARBONDATA-1750: Summary: SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty Key: CARBONDATA-1750 URL: https://issues.apache.org/jira/browse/CARBONDATA-1750 Project: CarbonData Issue Type: Bug Reporter: QiangCai Priority: Minor SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1729) Fix the compatibility issue with hadoop <= 2.6 and 2.7
[ https://issues.apache.org/jira/browse/CARBONDATA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated CARBONDATA-1729: --- Summary: Fix the compatibility issue with hadoop <= 2.6 and 2.7 (was: Recover to supporting Hadoop <= 2.6) > Fix the compatibility issue with hadoop <= 2.6 and 2.7 > -- > > Key: CARBONDATA-1729 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1729 > Project: CarbonData > Issue Type: Bug > Components: hadoop-integration >Affects Versions: 1.3.0 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang > Fix For: 1.3.0 > > > On branch master, when compiled with hadoop <= 2.6, it failed, the root cause > is using new API FileSystem.truncate which is added in hadoop 2.7. It needs > to implement a method called 'truncate' in file 'FileFactory.java' to support > hadoop <= 2.6. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1471 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1200/ ---
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1509 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1199/ ---
[GitHub] carbondata pull request #1504: [CARBONDATA-1732] Add S3 support in FileFacto...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1504 ---
[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1471 retest this please ---
[GitHub] carbondata issue #1509: [CARBONDATA-1739] Clean up store path interface
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1509 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1198/ ---
[GitHub] carbondata issue #1504: [CARBONDATA-1732] Add S3 support in FileFactory
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1504 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1197/ ---
[GitHub] carbondata issue #1504: [CARBONDATA-1732] Add S3 support in FileFactory
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1504 retest this please ---
[jira] [Assigned] (CARBONDATA-1740) Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query with order by when main table is having pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1740: Assignee: kumar vishal > Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query > with order by when main table is having pre-aggregate table > - > > Key: CARBONDATA-1740 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1740 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: kumar vishal > Labels: DFX > Fix For: 1.3.0 > > > lineitem3: has a pre-aggregate table > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > Error: org.apache.spark.sql.AnalysisException: expression > '`lineitem3_l_returnflag`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.;; > Project [l_returnflag#2356, l_linestatus#2366, sum(l_quantity)#2791, > sum(l_extendedprice)#2792] > +- Sort [aggOrder#2795 ASC NULLS FIRST, aggOrder#2796 ASC NULLS FIRST], true >+- !Aggregate [l_returnflag#2356, l_linestatus#2366], [l_returnflag#2356, > l_linestatus#2366, sum(l_quantity#2362) AS sum(l_quantity)#2791, > sum(l_extendedprice#2363) AS sum(l_extendedprice)#2792, > lineitem3_l_returnflag#2341 AS aggOrder#2795, lineitem3_l_linestatus#2342 AS > aggOrder#2796] > +- SubqueryAlias lineitem3 > +- > Relation[L_SHIPDATE#2353,L_SHIPMODE#2354,L_SHIPINSTRUCT#2355,L_RETURNFLAG#2356,L_RECEIPTDATE#2357,L_ORDERKEY#2358,L_PARTKEY#2359,L_SUPPKEY#2360,L_LINENUMBER#2361,L_QUANTITY#2362,L_EXTENDEDPRICE#2363,L_DISCOUNT#2364,L_TAX#2365,L_LINESTATUS#2366,L_COMMITDATE#2367,L_COMMENT#2368] > CarbonDatasourceHadoopRelation [ Database name :test_db1, Table name > :lineitem3, Schema :Some(StructType(StructField(L_SHIPDATE,StringType,true), > StructField(L_SHIPMODE,StringType,true), > StructField(L_SHIPINSTRUCT,StringType,true), > StructField(L_RETURNFLAG,StringType,true), > StructField(L_RECEIPTDATE,StringType,true), > StructField(L_ORDERKEY,StringType,true), > StructField(L_PARTKEY,StringType,true), > StructField(L_SUPPKEY,StringType,true), > StructField(L_LINENUMBER,IntegerType,true), > StructField(L_QUANTITY,DoubleType,true), > StructField(L_EXTENDEDPRICE,DoubleType,true), > StructField(L_DISCOUNT,DoubleType,true), StructField(L_TAX,DoubleType,true), > StructField(L_LINESTATUS,StringType,true), > StructField(L_COMMITDATE,StringType,true), > StructField(L_COMMENT,StringType,true))) ] (state=,code=0) > lineitem4: no pre-aggregate table created > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem4 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | A | F | 1.263625E7 | 1.8938515425239815E10 | > | N | F | 327800.0 | 4.91387677622E8| > | N | O | 2.5398626E7 | 3.810981608977963E10 | > | R | F | 1.2643878E7 | 1.8948524305619884E10 | > +---+---+--++--+ > *+Expected:+*: aggregate query with order by should run fine > *+Actual:+* aggregate query with order failed -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1740) Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query with order by when main table is having pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255622#comment-16255622 ] kumar vishal commented on CARBONDATA-1740: -- This is failing because of order by in query. In PreAggregate rules order by scenario is not handled > Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query > with order by when main table is having pre-aggregate table > - > > Key: CARBONDATA-1740 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1740 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S > Labels: DFX > Fix For: 1.3.0 > > > lineitem3: has a pre-aggregate table > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > Error: org.apache.spark.sql.AnalysisException: expression > '`lineitem3_l_returnflag`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.;; > Project [l_returnflag#2356, l_linestatus#2366, sum(l_quantity)#2791, > sum(l_extendedprice)#2792] > +- Sort [aggOrder#2795 ASC NULLS FIRST, aggOrder#2796 ASC NULLS FIRST], true >+- !Aggregate [l_returnflag#2356, l_linestatus#2366], [l_returnflag#2356, > l_linestatus#2366, sum(l_quantity#2362) AS sum(l_quantity)#2791, > sum(l_extendedprice#2363) AS sum(l_extendedprice)#2792, > lineitem3_l_returnflag#2341 AS aggOrder#2795, lineitem3_l_linestatus#2342 AS > aggOrder#2796] > +- SubqueryAlias lineitem3 > +- > Relation[L_SHIPDATE#2353,L_SHIPMODE#2354,L_SHIPINSTRUCT#2355,L_RETURNFLAG#2356,L_RECEIPTDATE#2357,L_ORDERKEY#2358,L_PARTKEY#2359,L_SUPPKEY#2360,L_LINENUMBER#2361,L_QUANTITY#2362,L_EXTENDEDPRICE#2363,L_DISCOUNT#2364,L_TAX#2365,L_LINESTATUS#2366,L_COMMITDATE#2367,L_COMMENT#2368] > CarbonDatasourceHadoopRelation [ Database name :test_db1, Table name > :lineitem3, Schema :Some(StructType(StructField(L_SHIPDATE,StringType,true), > StructField(L_SHIPMODE,StringType,true), > StructField(L_SHIPINSTRUCT,StringType,true), > StructField(L_RETURNFLAG,StringType,true), > StructField(L_RECEIPTDATE,StringType,true), > StructField(L_ORDERKEY,StringType,true), > StructField(L_PARTKEY,StringType,true), > StructField(L_SUPPKEY,StringType,true), > StructField(L_LINENUMBER,IntegerType,true), > StructField(L_QUANTITY,DoubleType,true), > StructField(L_EXTENDEDPRICE,DoubleType,true), > StructField(L_DISCOUNT,DoubleType,true), StructField(L_TAX,DoubleType,true), > StructField(L_LINESTATUS,StringType,true), > StructField(L_COMMITDATE,StringType,true), > StructField(L_COMMENT,StringType,true))) ] (state=,code=0) > lineitem4: no pre-aggregate table created > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem4 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | A | F | 1.263625E7 | 1.8938515425239815E10 | > | N | F | 327800.0 | 4.91387677622E8| > | N | O | 2.5398626E7 | 3.810981608977963E10 | > | R | F | 1.2643878E7 | 1.8948524305619884E10 | > +---+---+--++--+ > *+Expected:+*: aggregate query with order by should run fine > *+Actual:+* aggregate query with order failed -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1435: [CARBONDATA-1626]add data size and index size in tab...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1435 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1196/ ---
[GitHub] carbondata issue #1504: [CARBONDATA-1732] Add S3 support in FileFactory
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1504 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1195/ ---
[GitHub] carbondata pull request #1435: [CARBONDATA-1626]add data size and index size...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1435#discussion_r151458293 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -292,6 +290,35 @@ object CarbonDataRDDFactory { var executorMessage: String = "" val isSortTable = carbonTable.getNumberOfSortColumns > 0 val sortScope = CarbonDataProcessorUtil.getSortScope(carbonLoadModel.getSortScope) + +def updateStatus(status: Array[(String, (LoadMetadataDetails, ExecutionErrors))], --- End diff -- do not update table status file separately in separate method for size, add size while adding loadmetadata details to table status ---
[GitHub] carbondata pull request #1435: [CARBONDATA-1626]add data size and index size...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1435#discussion_r151446545 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2119,5 +2127,146 @@ public static String getNewTablePath(Path carbonTablePath, return parentPath.toString() + CarbonCommonConstants.FILE_SEPARATOR + carbonTableIdentifier .getTableName(); } + + /* + * This method will add data size and index size into tablestatus for each segment + */ + public static void addDataIndexSizeIntoMetaEntry(LoadMetadataDetails loadMetadataDetails, + String segmentId, CarbonTable carbonTable) throws IOException { +CarbonTablePath carbonTablePath = + CarbonStorePath.getCarbonTablePath((carbonTable.getAbsoluteTableIdentifier())); +HashMap dataIndexSize = +FileFactory.getDataSizeAndIndexSize(carbonTablePath, segmentId); +loadMetadataDetails + .setDataSize(dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toString()); +loadMetadataDetails + .setIndexSize(dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toString()); + } + + /** + * This method will calculate the data size and index size for carbon table + */ + public static HashMap calculateSize(CarbonTable carbonTable) --- End diff -- Update the method signature Map ---
[GitHub] carbondata pull request #1435: [CARBONDATA-1626]add data size and index size...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1435#discussion_r151446750 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2119,5 +2127,146 @@ public static String getNewTablePath(Path carbonTablePath, return parentPath.toString() + CarbonCommonConstants.FILE_SEPARATOR + carbonTableIdentifier .getTableName(); } + + /* + * This method will add data size and index size into tablestatus for each segment + */ + public static void addDataIndexSizeIntoMetaEntry(LoadMetadataDetails loadMetadataDetails, + String segmentId, CarbonTable carbonTable) throws IOException { +CarbonTablePath carbonTablePath = + CarbonStorePath.getCarbonTablePath((carbonTable.getAbsoluteTableIdentifier())); +HashMap dataIndexSize = --- End diff -- Change it to Map ---
[jira] [Updated] (CARBONDATA-1749) (Carbon1.3.0- DB creation external path) - mdt file is not created in directory as per configuration in carbon.properties
[ https://issues.apache.org/jira/browse/CARBONDATA-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1749: Description: Steps : In carbon.properties the mdt file directory path is configured as Carbon.update.sync.folder=hdfs://hacluster/user/test1 or /tmp/test1/ In beeline user creates a database by specifying the carbon store path and creates a carbon table in the db. drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks in HDFS UI if the mdt file is created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Issue : mdt file is not created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Also the folder is not created if the user configures the folder path as Carbon.update.sync.folder=/tmp/test1/ Expected : mdt file should be created in directory specified (hdfs://hacluster/user/test1) or /tmp/test1/ as per configuration in carbon.properties. was: Steps : In carbon.properties the mdt file directory path is configured as Carbon.update.sync.folder=hdfs://hacluster/user/test1 In beeline user creates a database by specifying the carbon store path and creates a carbon table in the db. drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks in HDFS UI if the mdt file is created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Issue : mdt file is not created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Expected : mdt file should be created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. > (Carbon1.3.0- DB creation external path) - mdt file is not created in > directory as per configuration in carbon.properties > - > > Key: CARBONDATA-1749 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1749 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.3.0 > Environment: 3 node cluster >Reporter: Chetan Bhat > Labels: Functional > > Steps : > In carbon.properties the mdt file directory path is configured as > Carbon.update.sync.folder=hdfs://hacluster/user/test1 or /tmp/test1/ > In beeline user creates a database by specifying the carbon store path and > creates a carbon table in the db. > drop database if exists test_db1 cascade; > create database test_db1 location 'hdfs://hacluster/user/test1'; > use test_db1; > create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY > string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE > double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY > 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); > User checks in HDFS UI if the mdt file is created in directory specified > (hdfs://hacluster/user/test1) as per configuration in carbon.properties. > Issue : mdt file is not created in directory specified > (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Also > the folder is not created if the user configures the folder path as > Carbon.update.sync.folder=/tmp/test1/ > Expected : mdt file should be created in directory specified > (hdfs://hacluster/user/test1) or /tmp/test1/ as per configuration in > carbon.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1507: [CARBONDATA-1326] Fixed high priority findbug issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1507 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1194/ ---
[jira] [Updated] (CARBONDATA-1748) (Carbon1.3.0- DB creation external path) - Permission of created table and database folder in carbon store not correct
[ https://issues.apache.org/jira/browse/CARBONDATA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1748: Description: Steps : In spark Beeline user executes the following queries. drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks the permission of the created database and table in carbon store using the bin/hadoop fs -getfacl command. Issue : The Permission of created table and database folder in carbon store not correct. i.e # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::r-x other::r-x Expected : Correct permissions for the created table and database folder in carbon store should be # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::--- other::--- was: Steps : drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks the permission of the created database and table in carbon store using the bin/hadoop fs -getfacl command. Issue : The Permission of created table and database folder in carbon store not correct. i.e # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::r-x other::r-x Expected : Correct permissions for the created table and database folder in carbon store should be # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::--- other::--- > (Carbon1.3.0- DB creation external path) - Permission of created table and > database folder in carbon store not correct > -- > > Key: CARBONDATA-1748 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1748 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels: security > > Steps : > In spark Beeline user executes the following queries. > drop database if exists test_db1 cascade; > create database test_db1 location 'hdfs://hacluster/user/test1'; > use test_db1; > create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY > string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE > double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY > 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); > User checks the permission of the created database and table in carbon store > using the bin/hadoop fs -getfacl command. > Issue : The Permission of created table and database folder in carbon store > not correct. i.e > # file: /user/test1/orders > # owner: anonymous > # group: users > user::rwx > group::r-x > other::r-x > Expected : Correct permissions for the created table and database folder in > carbon store should be > # file: /user/test1/orders > # owner: anonymous > # group: users > user::rwx > group::--- > other::--- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1749) (Carbon1.3.0- DB creation external path) - mdt file is not created in directory as per configuration in carbon.properties
Chetan Bhat created CARBONDATA-1749: --- Summary: (Carbon1.3.0- DB creation external path) - mdt file is not created in directory as per configuration in carbon.properties Key: CARBONDATA-1749 URL: https://issues.apache.org/jira/browse/CARBONDATA-1749 Project: CarbonData Issue Type: Bug Components: other Affects Versions: 1.3.0 Environment: 3 node cluster Reporter: Chetan Bhat Steps : In carbon.properties the mdt file directory path is configured as Carbon.update.sync.folder=hdfs://hacluster/user/test1 In beeline user creates a database by specifying the carbon store path and creates a carbon table in the db. drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks in HDFS UI if the mdt file is created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Issue : mdt file is not created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. Expected : mdt file should be created in directory specified (hdfs://hacluster/user/test1) as per configuration in carbon.properties. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1747) (Carbon1.3.0- DB creation external path) - Owner name of compacted segment and segment after update is not correct
[ https://issues.apache.org/jira/browse/CARBONDATA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1747: Description: Steps : In spark Beeline user executes the following queries drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); alter table ORDERS compact 'major'; update orders set (O_ORDERKEY)=(1) where O_CUSTKEY=6259021; After compaction and update user checks the Owner name of compacted segment and segment name after update in HDFS UI. Issue : In HDFS UI before compaction and update the owner name of the existing segment folders was "anonymous". After compaction and update the owner name of the compacted segment folder and segment which is impacted by update is displayed as "root". Expected : After compaction and update the owner name of the compacted segment folder and segment which is impacted by update should be "anonymous". was: Steps : User executes the following queries drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); alter table ORDERS compact 'major'; update orders set (O_ORDERKEY)=(1) where O_CUSTKEY=6259021; After compaction and update user checks the Owner name of compacted segment and segment name after update in HDFS UI. Issue : In HDFS UI before compaction and update the owner name of the existing segment folders was "anonymous". After compaction and update the owner name of the compacted segment folder and segment which is impacted by update is displayed as "root". Expected : After compaction and update the owner name of the compacted segment folder and segment which is impacted by update should be "anonymous". > (Carbon1.3.0- DB creation external path) - Owner name of compacted segment > and segment after update is not correct > -- > > Key: CARBONDATA-1747 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1747 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels:
[GitHub] carbondata issue #1513: [CARBONDATA-1745] Use default metastore path from Hi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1513 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1193/ ---
[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1514 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1192/ ---
[GitHub] carbondata issue #1435: [CARBONDATA-1626]add data size and index size in tab...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1435 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1191/ ---
[GitHub] carbondata issue #1512: [CARBONDATA-1742] Fix NullPointerException in Segmen...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1512 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1190/ ---
[GitHub] carbondata issue #1513: [CARBONDATA-1745] Use default metastore path from Hi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1513 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1189/ ---
[GitHub] carbondata pull request #1505: [CARBONDATA-1733] While load is in progress, ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1505 ---
[GitHub] carbondata issue #1494: [CARBONDATA-1706] Making index merge DDL insensitive...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1494 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1188/ ---
[GitHub] carbondata issue #1505: [CARBONDATA-1733] While load is in progress, Show se...
Github user kumarvishal09 commented on the issue: https://github.com/apache/carbondata/pull/1505 LGTM ---
[jira] [Updated] (CARBONDATA-1748) (Carbon1.3.0- DB creation external path) - Permission of created table and database folder in carbon store not correct
[ https://issues.apache.org/jira/browse/CARBONDATA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1748: Description: Steps : drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks the permission of the created database and table in carbon store using the bin/hadoop fs -getfacl command. Issue : The Permission of created table and database folder in carbon store not correct. i.e # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::r-x other::r-x Expected : Correct permissions for the created table and database folder in carbon store should be # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::--- other::--- was: Steps : drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks the permission of the created database and table in carbon store using the bin/hadoop fs -getfacl command. Issue : The Permission of created table and database folder in carbon store not correct. i.e # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::r-x other::r-x Expected : Correct permissions should be # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::--- other::--- > (Carbon1.3.0- DB creation external path) - Permission of created table and > database folder in carbon store not correct > -- > > Key: CARBONDATA-1748 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1748 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels: security > > Steps : > drop database if exists test_db1 cascade; > create database test_db1 location 'hdfs://hacluster/user/test1'; > use test_db1; > create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY > string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE > double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY > 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); > User checks the permission of the created database and table in carbon store > using the bin/hadoop fs -getfacl command. > Issue : The Permission of created table and database folder in carbon store > not correct. i.e > # file: /user/test1/orders > # owner: anonymous > # group: users > user::rwx > group::r-x > other::r-x > Expected : Correct permissions for the created table and database folder in > carbon store should be > # file: /user/test1/orders > # owner: anonymous > # group: users > user::rwx > group::--- > other::--- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1748) (Carbon1.3.0- DB creation external path) - Permission of created table and database folder in carbon store not correct
Chetan Bhat created CARBONDATA-1748: --- Summary: (Carbon1.3.0- DB creation external path) - Permission of created table and database folder in carbon store not correct Key: CARBONDATA-1748 URL: https://issues.apache.org/jira/browse/CARBONDATA-1748 Project: CarbonData Issue Type: Bug Components: other Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Chetan Bhat Steps : drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); User checks the permission of the created database and table in carbon store using the bin/hadoop fs -getfacl command. Issue : The Permission of created table and database folder in carbon store not correct. i.e # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::r-x other::r-x Expected : Correct permissions should be # file: /user/test1/orders # owner: anonymous # group: users user::rwx group::--- other::--- -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1432: [CARBONDATA-1608]Support Column Comment for C...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1432 ---
[jira] [Assigned] (CARBONDATA-1743) Carbon1.3.0-Pre-AggregateTable - Query returns no value if run at the time of pre-aggregate table creation
[ https://issues.apache.org/jira/browse/CARBONDATA-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor reassigned CARBONDATA-1743: Assignee: Kunal Kapoor > Carbon1.3.0-Pre-AggregateTable - Query returns no value if run at the time of > pre-aggregate table creation > -- > > Key: CARBONDATA-1743 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1743 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > 1. Create table and load with large data > create table if not exists lineitem4(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem4 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. Create a pre-aggregate table > create datamap agr_lineitem4 ON TABLE lineitem4 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem4 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Run aggregate query at the same time > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem4 group by l_returnflag, l_linestatus; > *+Expected:+*: aggregate query should fetch data either from main table or > pre-aggregate table. > *+Actual:+* aggregate query does not return data until the pre-aggregate > table is created > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (1.74 seconds) > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (0.746 seconds) > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 2.9808092E7 | 4.471079473931997E10 | > | A | F | 1.145546488E9| 1.717580824169429E12 | > | N | O | 2.31980219E9 | 3.4789002701143467E12 | > | R | F | 1.146403932E9| 1.7190627928317903E12 | > +---+---+--++--+ > 4 rows selected (0.8 seconds) > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 2.9808092E7 | 4.471079473931997E10 | > | A | F | 1.145546488E9| 1.717580824169429E12 | > | N | O | 2.31980219E9
[jira] [Created] (CARBONDATA-1747) (Carbon1.3.0- DB creation external path) - Owner name of compacted segment and segment after update is not correct
Chetan Bhat created CARBONDATA-1747: --- Summary: (Carbon1.3.0- DB creation external path) - Owner name of compacted segment and segment after update is not correct Key: CARBONDATA-1747 URL: https://issues.apache.org/jira/browse/CARBONDATA-1747 Project: CarbonData Issue Type: Bug Components: other Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Chetan Bhat Steps : User executes the following queries drop database if exists test_db1 cascade; create database test_db1 location 'hdfs://hacluster/user/test1'; use test_db1; create table if not exists ORDERS(O_ORDERDATE string,O_ORDERPRIORITY string,O_ORDERSTATUS string,O_ORDERKEY string,O_CUSTKEY string,O_TOTALPRICE double,O_CLERK string,O_SHIPPRIORITY int,O_COMMENT string) STORED BY 'org.apache.carbondata.format'TBLPROPERTIES ('table_blocksize'='128'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); load data inpath "hdfs://hacluster/chetan/orders.tbl.1" into table ORDERS options('DELIMITER'='|','FILEHEADER'='O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT','batch_sort_size_inmb'='32'); alter table ORDERS compact 'major'; update orders set (O_ORDERKEY)=(1) where O_CUSTKEY=6259021; After compaction and update user checks the Owner name of compacted segment and segment name after update in HDFS UI. Issue : In HDFS UI before compaction and update the owner name of the existing segment folders was "anonymous". After compaction and update the owner name of the compacted segment folder and segment which is impacted by update is displayed as "root". Expected : After compaction and update the owner name of the compacted segment folder and segment which is impacted by update should be "anonymous". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1504: [CARBONDATA-1732] Add S3 support in FileFactory
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1504 retest this please ---
[GitHub] carbondata pull request #1514: [CARBONDATA-1746] Count star optimization
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/1514 [CARBONDATA-1746] Count star optimization Since carbon records number of row in metadata, count star query can leverage it to improve performance. - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done No testcase added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. MR38 You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata count_star Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1514 commit 09ff30688fe84c199fb86b92b9547c859e05a75c Author: Jacky Li Date: 2017-11-16T09:27:21Z add s3 in filefactory commit 8d31b0974071bb7cd8ad72aa58990b9f2621b8a1 Author: Jacky Li Date: 2017-11-16T11:41:19Z remove unnecessary path commit 5a7008e0691daff9bad3c8bf707d0592500a1f24 Author: Jacky Li Date: 2017-11-16T12:56:55Z clean CarbonEnv commit aeee0e5f7df61f1add1dca30be0722aab0a8d2dd Author: Jacky Li Date: 2017-11-16T13:23:22Z remove AKSK in log commit fc3cf73e7d13212faa9ca4502d09c637c57ff970 Author: Jacky Li Date: 2017-11-16T13:57:54Z change default metastore path commit 7a4c77526b97b6cc5e9bd286dd3701f4d1ba86c5 Author: Jacky Li Date: 2017-11-16T14:57:07Z fix testcase commit 90b1841200c1086a2567e54787750b970208ed13 Author: Jacky Li Date: 2017-11-16T14:53:02Z add count star optimization ---
[jira] [Created] (CARBONDATA-1746) Count Star optimization
Jacky Li created CARBONDATA-1746: Summary: Count Star optimization Key: CARBONDATA-1746 URL: https://issues.apache.org/jira/browse/CARBONDATA-1746 Project: CarbonData Issue Type: New Feature Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.3.0 Since carbon records number of row in metadata, count star query can leverage it to improve performance -- This message was sent by Atlassian JIRA (v6.4.14#64029)