[jira] [Created] (CARBONDATA-676) Code clean
zhangshunyu created CARBONDATA-676: -- Summary: Code clean Key: CARBONDATA-676 URL: https://issues.apache.org/jira/browse/CARBONDATA-676 Project: CarbonData Issue Type: Improvement Reporter: zhangshunyu Assignee: zhangshunyu Priority: Minor To clean some code: Correct the spelling mistake Remove unused function Iterate the Array instead of transform it to List. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-451) Can not run query on windows now
zhangshunyu created CARBONDATA-451: -- Summary: Can not run query on windows now Key: CARBONDATA-451 URL: https://issues.apache.org/jira/browse/CARBONDATA-451 Project: CarbonData Issue Type: Bug Components: core Reporter: zhangshunyu Assignee: zhangshunyu Fix For: 0.2.0-incubating As tablePath on windows has '/' and not replaced when substring, it would throw error when execute query. I have fixed this and will raise a pr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #319: [CARBONDATA-411] Test
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/319 [CARBONDATA-411] Test test You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata a Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #319 commit e57977da9fa64e87d1e54f84c35ad718a7701ec9 Author: zhaow <zhaow@zhaowdemacbook-pro.local> Date: 2016-11-16T01:19:16Z add sth --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-411) test
zhangshunyu created CARBONDATA-411: -- Summary: test Key: CARBONDATA-411 URL: https://issues.apache.org/jira/browse/CARBONDATA-411 Project: CarbonData Issue Type: Improvement Components: core Reporter: zhangshunyu Priority: Minor Fix For: 0.2.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #309: [CARBONDATA-402] support CreateAsSel...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/309#discussion_r87588006 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1375,6 +1376,24 @@ private[sql] case class ShowLoads( } +private[sql] case class CreateCarbonTableAsSelect( +databaseName: Option[String], +tableName: String, +allowExisting: Boolean, +createSql: String) extends RunnableCommand { + + override def run(sqlContext: SQLContext): Seq[Row] = { +val dbName = getDB.getDatabaseName(databaseName, sqlContext) +val subQueryIndex = createSql.toUpperCase.indexOf("SELECT") --- End diff -- when the table name likes âselectabcdâï¼the index would be the right oneï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #259: Fix constants and method names
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/259 Fix constants and method names ## Why raise this pr? To rename some constants and method names, for example: It is hard to get clear about what the parameter is used for 'carbon.number.of.cores', cores for what? It is hard to get clear about what the method is used for 'getNumberOfCores', query or load cores? etc ## How to test? Pass all the test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata constants Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/259.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #259 commit 8a3c1b4758a93d7e5b7c1d983f9a9309995f4c79 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-10-26T13:53:21Z Fix constans --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #222: [CARBONDATA-221] Fix the bug of inve...
GitHub user Zhangshunyu reopened a pull request: https://github.com/apache/incubator-carbondata/pull/222 [CARBONDATA-221] Fix the bug of inverted index that store inverted index in metadata by using Encoding.INVERTED_INDEX. ## Why raise this pr? 1. Problem: In current code, inverted index in ddl info is not stored into store, and when we restart the cluster, query might mismatch. 2. To fix problem 1, current code set always true to use inverted index, and we can not configure inverted index now. We should fix this problem from its root cause. ## How to solve? Using the Encoding as the indentifier to check whether using inverted index, this Encoding is in thrift format now, so we no need to modify the thrift format. Here it is the same to the query logic in CompressedDimensionChunkFileBasedReader: ``` if (CarbonUtil.hasEncoding(dimensionColumnChunk.get(blockIndex).getEncodingList(), Encoding.INVERTED_INDEX)) { invertedIndexes = CarbonUtil .getUnCompressColumnIndex(dimensionColumnChunk.get(blockIndex).getRowIdPageLength(), fileReader.readByteArray(filePath, dimensionColumnChunk.get(blockIndex).getRowIdPageOffset(), dimensionColumnChunk.get(blockIndex).getRowIdPageLength()), numberComressor); // get the reverse index invertedIndexesReverse = getInvertedReverseIndex(invertedIndexes); } ``` it also use Encoding.INVERTED_INDEX to check whether one column is use inverted index. ## How to test? Pass all the test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata fix_index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #222 commit c27a8a9e33529e53020c477c70d0c079724070d2 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T07:48:03Z Save useInvertedIndex info into thrift store commit 3c8da81869e1a8eca8bdde3d82bc0a9d185bdc3d Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T07:48:15Z Save useInvertedIndex info into thrift store commit b834e4889f5c5eadcee1c232c1a6070df0c1bf60 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T09:46:12Z Fix the judge of no_dic_col commit e8b338c2a7a9e3e28a591bdfe57a5f704f1496d6 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T10:04:20Z add commont --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #222: [CARBONDATA-221] Fix the bug of inve...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/222 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83139950 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- @Jay357089 I think this is a good idea to extract ConvertByteToReadable as a method, since it can be used in many logs, especially for analyzing performance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #231: [CARBONDATA-311]Log the data size of...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/231 [CARBONDATA-311]Log the data size of blocklet during data load. ## Why raise this pr? The blocklet size is an important parameter for analyzing data load and query, this info should be logged. ## How to test? Pass all the test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata logblocklet Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #231 commit a110504f58e688e42223e896f7a1cf729463cf9d Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-10-13T03:17:21Z Log the data size of each blocklet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-311) Log the data size of blocklet during data load.
zhangshunyu created CARBONDATA-311: -- Summary: Log the data size of blocklet during data load. Key: CARBONDATA-311 URL: https://issues.apache.org/jira/browse/CARBONDATA-311 Project: CarbonData Issue Type: Improvement Affects Versions: 0.1.1-incubating Reporter: zhangshunyu Assignee: zhangshunyu Priority: Minor Fix For: 0.2.0-incubating Log the data size of blocklet during data load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83027603 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long blockSize, long fileSize) { if (remainder > 0) { maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder; } +LOGGER.info("The configured block size is " + blockSize + " byte, " + --- End diff -- @jackylk set in mbï¼but here already converted to byte. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #224: [CARBONDATA-239]Add scan_blocklet_nu...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/224#discussion_r82958230 --- Diff: core/src/main/java/org/apache/carbondata/scan/processor/AbstractDataBlockIterator.java --- @@ -127,11 +133,15 @@ protected boolean updateScanner() { } } - private AbstractScannedResult getNextScannedResult() throws QueryExecutionException { + private AbstractScannedResult getNextScannedResult(QueryStatisticsRecorder recorder, --- End diff -- @sujith71955 OK, i will use a statistics model, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #225: [CARBONDATA-295]Abstract Compressor ...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/225 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-295) Abstract Snappy interface and seperate it from Compressor interface
zhangshunyu created CARBONDATA-295: -- Summary: Abstract Snappy interface and seperate it from Compressor interface Key: CARBONDATA-295 URL: https://issues.apache.org/jira/browse/CARBONDATA-295 Project: CarbonData Issue Type: Improvement Components: data-load Affects Versions: 0.1.1-incubating Reporter: zhangshunyu Assignee: zhangshunyu Priority: Minor Fix For: 0.2.0-incubating Currently, we only have snappy compressor who extends form Compressor interface, for future expansion, we need to abstract Snappy interface and seperate it from Compressor interface, it means Compressor interface is the parent of all compressors, and SnappyCompressor interface and the other compressor's interface(or abstract class) should extends Compressor interface, as to different data type for different compressor, it would extend its own interface/abstract class. for example: Compressor -> SnappyCompressor -> SnappyDoubleCompression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-293) Add scan_blocklet_num for query statistics
zhangshunyu created CARBONDATA-293: -- Summary: Add scan_blocklet_num for query statistics Key: CARBONDATA-293 URL: https://issues.apache.org/jira/browse/CARBONDATA-293 Project: CarbonData Issue Type: Improvement Components: data-query Affects Versions: 0.1.1-incubating Reporter: zhangshunyu Assignee: zhangshunyu Fix For: 0.2.0-incubating Add scan_blocklet_num for query statistics -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...
GitHub user Zhangshunyu reopened a pull request: https://github.com/apache/incubator-carbondata/pull/204 [CARBONDATA-280]Fix the bug that when table properties is repeated it only set the last one ## Why raise this pr? When table properties is repeated it only set the last one, for example, ``` CREATE TABLE IF NOT EXISTS carbontable (ID Int, date Timestamp, country String, name String, phonetype String, serialname String, salary Int) STORED BY 'carbondata' TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID', 'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary') ``` As we use map to store the properties, only salary is set to DICTIONARY_INCLUDE and only phonetype is set to DICTIONARY_EXCLUDE. ## How to solve? **We should do restrict syntax check that 'DICTIONARY_EXCLUDE'='country,phonetype' , 'DICTIONARY_INCLUDE'='ID,salary**' and if table properties is repeated, throw an MalformedCarbonCommandException to tell the user that Table properties is repeated, so that the user would not perform error operation. ## How to test? Pass the exist test cases and the new test case for this bug. ## Test Result CI has passed: http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/354/testReport/ You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata tbprop Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #204 commit 3e0030e04bff9d11f87471684b4b7b7a8d8b6209 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-28T04:50:01Z Fix the bug that when table properties is repeated it only set the last one commit 1828b2b78b3de9f7fa127cfcc17bf24d6c138640 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-28T05:11:12Z Fix the test case commit 876828400f02bb68190222e38898ccec29bb2f04 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-28T08:34:36Z Simply commit a7a03508b494701ec641b66449a2f0df81e2fde0 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-28T08:38:57Z Simply --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #222: [CARBONDATA-221] Fix the bug of inve...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/222 [CARBONDATA-221] Fix the bug of inverted index that store inverted index in metadata. ## Why raise this pr? 1. Problem: In current code, inverted index in ddl info is not stored into store, and when we restart the cluster, query might mismatch. 2. To fix problem 1, current code set always true to use inverted index, and we can not configure inverted index now, this is not reasonable. We should fix this problem from its root cause. ## How to solve? Using the Encoding as the indentifier to check whether using inverted index, this Encoding is in thrift format now, so we no need to modify the thrift format. ## How to test? Pass all the test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata fix_index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #222 commit c27a8a9e33529e53020c477c70d0c079724070d2 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T07:48:03Z Save useInvertedIndex info into thrift store commit 3c8da81869e1a8eca8bdde3d82bc0a9d185bdc3d Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T07:48:15Z Save useInvertedIndex info into thrift store commit b834e4889f5c5eadcee1c232c1a6070df0c1bf60 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T09:46:12Z Fix the judge of no_dic_col commit e8b338c2a7a9e3e28a591bdfe57a5f704f1496d6 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T10:04:20Z add commont --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-289) Support MB/M for table block size and update the doc about this new feature.
zhangshunyu created CARBONDATA-289: -- Summary: Support MB/M for table block size and update the doc about this new feature. Key: CARBONDATA-289 URL: https://issues.apache.org/jira/browse/CARBONDATA-289 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 0.1.0-incubating Reporter: zhangshunyu Assignee: zhangshunyu Priority: Minor Fix For: 0.2.0-incubating Support MB/M for table block size and update the doc about this new feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/204 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #52: [WIP] Support varchar datatype as SPA...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/52 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [Discuss]Set block_size for table on table level
For each table, we can set block size consider the data.size, this is because that when execute query, each task will get one block to process one time, when the blocks num < parallelism, set a reasonable block size would get most suitable block num, to make the best of parallelism. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472p1538.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/204 [CARBONDATA-280]Fix the bug that when table properties is repeated it only set the last one ## Why raise this pr? When table properties is repeated it only set the last one, for example, ``` CREATE TABLE IF NOT EXISTS carbontable (ID Int, date Timestamp, country String, name String, phonetype String, serialname String, salary Int) STORED BY 'carbondata' TBLPROPERTIES(**'DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID', 'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary'**) ``` only salary is set to DICTIONARY_INCLUDE and only phonetype is set to DICTIONARY_EXCLUDE. ## How to solve? We should do restrict syntax check and if table properties is repeated, throw an MalformedCarbonCommandException to tell the user that Table properties is repeated, so that the user would not perform error operation. ## How to test? Pass the exist test cases and the new test case for this bug. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata tbprop Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #204 commit 3e0030e04bff9d11f87471684b4b7b7a8d8b6209 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-28T04:50:01Z Fix the bug that when table properties is repeated it only set the last one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-280) when table properties is repeated it only set the last one
zhangshunyu created CARBONDATA-280: -- Summary: when table properties is repeated it only set the last one Key: CARBONDATA-280 URL: https://issues.apache.org/jira/browse/CARBONDATA-280 Project: CarbonData Issue Type: Bug Components: sql Affects Versions: 0.1.1-incubating Reporter: zhangshunyu Assignee: zhangshunyu Priority: Minor Fix For: 0.2.0-incubating when table properties is repeated it only set the last one: For example, CREATE TABLE IF NOT EXISTS carbontable (ID Int, date Timestamp, country String, name String, phonetype String, serialname String, salary Int) STORED BY 'carbondata' TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID', 'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary') only salary is set to DICTIONARY_INCLUDE and only phonetype is set to DICTIONARY_EXCLUDE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 回复:[Discuss]Set block_size for table on table level
I have verified that it would not affect the older tables. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472p1531.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [jira] [Created] (CARBONDATA-275) org.apache.thrift.TBaseHelper.hashCode(int) can't find this function
Use thrift 0.93 can solve this problem. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBONDATA-275-org-apache-thrift-TBaseHelper-hashCode-int-can-t-find-this-function-tp1488p1530.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[GitHub] incubator-carbondata pull request #195: FIX CI
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/195 FIX CI You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata FIXCI Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/195.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #195 commit 22b1f1491d5e5306db012a7541aa30790d11cdae Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-23T08:08:16Z FIX CI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #191: [WIP] Change delete segments parser
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/191 [WIP] Change delete segments parser You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata parser925 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/191.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #191 commit abd270f2c6114e35e0aa1da71c9b2498187357b8 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T12:53:52Z New parser gram commit a8d18e07f68cb469e235fd7eebca9df3630163e7 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T12:54:09Z New parser gram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/189 [CARBONDATA-267] Set block_size for table on table level ## Why raised this pr? To configure block file size for each table on column level. ## How to solve? Add a new parameter in TableSchema, when create table, setting it in table properties and write this info into thrift file. ## How to test? Pass all the test cases and the new test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata blocksize922 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #189 commit ed40d0f8012297cc9e9cffb3812ef5d141e03879 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T08:19:29Z Set block_size for table on table level --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #188: [WIP] Add table_block_size on table ...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/188 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-267) Set block_size for table on table level
zhangshunyu created CARBONDATA-267: -- Summary: Set block_size for table on table level Key: CARBONDATA-267 URL: https://issues.apache.org/jira/browse/CARBONDATA-267 Project: CarbonData Issue Type: New Feature Affects Versions: 0.1.0-incubating Reporter: zhangshunyu Assignee: zhangshunyu Fix For: 0.2.0-incubating Set block_size for table on table level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #188: [WIP] Add table_block_size on table ...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/188 [WIP] Add table_block_size on table level. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata block_size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #188 commit a1243394de4daa6ea8fdc1024266dec2e40b45ea Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T08:43:47Z Add delete all carbon tables commit 5783520db18e2fdfac3ab98c9436a5ff06228988 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T09:01:46Z Add test case commit 602cc09b2fb6f5169267264ef9c1190717e5fae9 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T09:20:30Z Fix test case commit eda8f100f4fd4c42f46061cb72681df16d6cfb84 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T09:47:49Z Fix test case commit 847d21e3310e9c9e7424e7324064358a2b2bce5f Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T09:49:36Z Fix test case commit 63eb0dd21d6ca64a743ed2e3311d42a676ae48b3 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T03:03:58Z Add a new blocksize format param commit 78ee8a46949417bfa06b274c9f2750299aba997b Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T03:22:31Z Add a new blocksize format param commit eed034458f438eb1154889e53bfe85775a476abe Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T03:32:51Z Add a new blocksize format param commit 01565eafea966317bdfa46d52c274bd1bdc39671 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-22T07:14:04Z Add parser level --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #185: Add a new feature that support delet...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/185 Add a new feature that support delete all carbon tables under one database. Why rasie this pr? Add a new feature that support delete all carbon tables under one database. **Only delete all carbon tables, do not has effect on other tables.** How to test? Pass all the testcases including the new test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata dropall Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/185.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #185 commit bbac6bc2db5ecc88ac8dd886108054ab22c726e4 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T08:43:47Z Add delete all carbon tables commit f0e922673bb3801b50fe19d9ef279ece028ebfac Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-21T09:01:46Z Add test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #178: [WIP] Fix NULL values issue
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/178 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #178: [WIP] Fix NULL values issue
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/178 [WIP] Fix NULL values issue You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata null Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #178 commit a2b46eaf3add1fd3923e7dc1010ad5ea72d6341f Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-20T12:29:11Z Fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-254) Code Inspection Optiminization
zhangshunyu created CARBONDATA-254: -- Summary: Code Inspection Optiminization Key: CARBONDATA-254 URL: https://issues.apache.org/jira/browse/CARBONDATA-254 Project: CarbonData Issue Type: Improvement Reporter: zhangshunyu Code Inspection Optiminization -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #171: [WIP] Code Inspection Optiminization
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/171 [WIP] Code Inspection Optiminization You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata codeinspection Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #171 commit 31aca94d187e436c30dde56e2fa438f2bc250f5d Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-18T03:00:35Z inspection commit 7a200303705ee87fb1543f1b9867a5251a746d09 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-18T03:17:19Z code inspection optiminization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #52: [CARBONDATA-104] Support varchar data...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/52 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-231) Rename repeared table names in same test file and add drop tables.
zhangshunyu created CARBONDATA-231: -- Summary: Rename repeared table names in same test file and add drop tables. Key: CARBONDATA-231 URL: https://issues.apache.org/jira/browse/CARBONDATA-231 Project: CarbonData Issue Type: Improvement Reporter: zhangshunyu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #142: [WIP][CARBONDATA-221] Fix the bug of...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/142 [WIP][CARBONDATA-221] Fix the bug of inverted index that store inverted index in metadata. ## Why raise this pr? Inverted index in ddl info was not stored into store, and when we restart the culster, query might mismatch. ## How to solve? Using the Encoding as the indentifier to check whether using inverted index. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #142 commit f76908319c8eae0b65ea2cd9d0f5899225c95667 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T07:48:03Z Save useInvertedIndex info into thrift store commit ecd5403105fd37da239db66bd313cf548a532eef Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-08T07:48:15Z Save useInvertedIndex info into thrift store --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #129: Remove not needed parameters
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/129 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #129: Remove not needed parameters
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/129 Remove not needed parameters There are many parameters in CarbonCommonConstants we not use now, should remove them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata mater Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit 296a7350a667a13a46c74b2b38d36ee8dc13f53f Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-05T06:33:36Z remove not needed parameters --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #117: [WIP]Fix the bug that when subquery ...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/117 [WIP]Fix the bug that when subquery with sort and filter the result is empty. ## Why raise this pr? Fix this bug: When the query has subquery with sort and filter, it can not return resullt. ## How to solve? When the query likes this, the optimized plan by spark never push down the filter, and as aresult the sort is not decoded by carbon, when use filter, the int values can not resolved as string values by spark. So we shoud decode them earlier when the child of filter is sort. How to test? Added new testcases and should pass them and pass CI. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata query91 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #117 commit 31eb578927044a2d6ea8d80c487b5b5c4f73004a Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-01T07:33:01Z Fix the bug that subquery with sort and filter the result is empty commit 6ecc2e69172c4103fca2ff6d7173a2236be4fc03 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-09-01T07:45:54Z Fix the bug that subquery with sort and filter the result is empty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #110: [CARBONDATA-193]Fix the bug that neg...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/110#discussion_r76929947 --- Diff: core/src/main/java/org/apache/carbondata/core/util/ValueCompressionUtil.java --- @@ -78,26 +78,26 @@ private ValueCompressionUtil() { private static DataType getDataType(double value, int decimal, byte dataTypeSelected) { DataType dataType = DataType.DATA_DOUBLE; if (decimal == 0) { - if (value < Byte.MAX_VALUE) { + if (value < Byte.MAX_VALUE && value > Byte.MIN_VALUE) { --- End diff -- Before i modified to use new "absMaxValue", it is needed, but now i can detele this code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #110: [wip][CARBONDATA-193]Fix the bug tha...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/110 [wip][CARBONDATA-193]Fix the bug that negative data compress is not properly when datatype is Double ## Why raise this pr? **Fix bug: negative data compress is not properly when datatype is Double.** For example, If the column datatype is double and it data is like this: -7489.7976 -11234567490 -11234567490 -1.2 -2 -11234567490 -11234567490 -11234567490 -11234567490 **the query result would be all 0, this is a bug.** ## How to solve? This bug is becasue we only consider the MAX value of this column is +values, and conpare it wll Byte.MAXCVALUE, here is 127. But when the values is -12343554634645, it also < Byte.MAXVALUE, but we can not use byte, becasue it < -127, so we should consider both Byte.MAXVALUE and Byte.MINVALUE, the same to other datatype. How to test? Added test case: test("When the values of Double datatype are negative values"), should pass all the exist cases and this new testcase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata double92 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/110.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #110 commit 5773928ff7c7867a1925ce258fb2b57bc49513b6 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-31T01:34:12Z Fix the bug that negtive data compress is not properly when datatype is Double --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-193) Data is not loading properly when double data type is having negative values
zhangshunyu created CARBONDATA-193: -- Summary: Data is not loading properly when double data type is having negative values Key: CARBONDATA-193 URL: https://issues.apache.org/jira/browse/CARBONDATA-193 Project: CarbonData Issue Type: Bug Reporter: zhangshunyu For example: -7489.797600 -11234567489.797 -11234567489.7 -1.2 -2 -11234567489.797600 -11234567489.797600 -11234567489.797600 -11234567489.797600 would be all 0 after query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/104#discussion_r76580960 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -112,25 +116,29 @@ private void initializeReader() throws IOException { // if already one input stream is open first we need to close and then // open new stream close(); -// get the block offset -long startOffset = this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset(); -FileType fileType = FileFactory - .getFileType(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath()); -// calculate the end offset the block -long endOffset = - this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength() + startOffset; - -// create a input stream for the block -DataInputStream dataInputStream = FileFactory - .getDataInputStream(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath(), -fileType, bufferSize, startOffset); -// if start offset is not 0 then reading then reading and ignoring the extra line -if (startOffset != 0) { - LineReader lineReader = new LineReader(dataInputStream, 1); - startOffset += lineReader.readLine(new Text(), 0); + +String path = this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath(); +FileType fileType = FileFactory.getFileType(path); + +if (path.endsWith(".gz")) { + DataInputStream dataInputStream = + FileFactory.getCompressedDataInputStream(path, fileType, bufferSize); + inputStreamReader = new BufferedReader(new InputStreamReader(dataInputStream)); +} else { + long startOffset = this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset(); + long blockLength = this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength(); + long endOffset = blockLength + startOffset; + + DataInputStream dataInputStream = FileFactory.getDataInputStream(path, fileType, bufferSize); + + // if start offset is not 0 then reading then reading and ignoring the extra line + if (startOffset != 0) { +LineReader lineReader = new LineReader(dataInputStream, 1); +startOffset += lineReader.readLine(new Text(), 0); + } + inputStreamReader = new BufferedReader(new InputStreamReader( + new BoundedDataStream(dataInputStream, endOffset - startOffset))); --- End diff -- Can not find class BoundedDataStream --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #103: Fix the bug that when using Decimal ...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/103 Fix the bug that when using Decimal type as dictionary gen surrogate key will mismatch for the same values during increment load. ## Why raise this pr? **Fix bug: when using Decimal type as dictionary gen surrogate key will mismatch for the same values during increment load.** For example, when we specify Decimal type column using dictionary, as the using of `DataTypeUtil.normalizeColumnValueForItsDataType`, deciaml data for example 45, if we specify the precision of this column as 3, parsedValue would be 45.000, and this 45.000 would be written into dic file by writer.write(parsedValue). As a result, the second time we load the same data 45, dictionary.getSurrogateKey(value) would compare the value with dic value, but here the value is 45, our dic value is 45.000 stored as string, so dic would think that i don not have 45, this would lead to repeated values in dic, this is a mistake. How to solve this? Before check the surrogate key, if the datatype is decimal, we first using his parsedValue as value to check, this would not take 45 itself as different value. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata decimalDic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/103.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #103 commit 0403b9fe4ed32b9cbc4727b5a541cfccb089422e Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-29T08:29:54Z Fix the bug that when Decimal type as dictionary gen surrogate key will mismatch for the same values --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...
GitHub user Zhangshunyu reopened a pull request: https://github.com/apache/incubator-carbondata/pull/81 [CARBONDATA-132] Fix the bug that the CSV file header exception can not be shown to user using beeline. ## Why raise this pr: **For bug fix: The exception that 'CSV File provided is not proper. Column names in schema and csv header are not same' can not be shown to beeline.** For example, when data load is failed because of wrong csv file header in load DDL, the exception message only shows in executor side like "CSV header provided in DDL is not proper. Column names in schema and CSV header are not the same" but the **user using beeline can not get it from driver side because dirver only shows "Dataload Failure"** , it is very inconvenient for user to get the reason unless he check the executor log info. ## How to solve: Get the Exception on driver side and parse the cause, get the casue message to driver. Show DataLoadingException is because that it is mainly about CSV file and wrapped in understandable message which can be shown to the user. ## How to test Add new test cases: 1. If both ddl and file not have fileheader: Beeline will show like : "DataLoad failure: CSV File provided is not proper. Column names in schema and csv header are not same. CSVFile Name : windows.csv" 2. If ddl did not provide the proper file header: Beeline will show like :"DataLoad failure: CSV header provided in DDL is not proper. Column names in schema and CSV header are not the same." You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata exc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/81.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #81 commit d6c32cb6ea80ccfe9f7aee1e14236d90933fce1a Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-22T02:08:26Z Parse some Spark exception from executor side and show them directly on driver commit 5e1235ae317ea1915f3cba24f57fad6634754af3 Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com> Date: 2016-08-09T05:17:02Z CARBONDATA-153 Record count is not matching while loading the data when one data node went down in HA setup commit 62c0b05e62e3c2cadc03e6355bd587d14eab355c Author: Venkata Ramana G <ramana.gollam...@huawei.com> Date: 2016-08-22T13:29:06Z [CARBONDATA-153] This closes #77 commit 7e0584e7a1d90724e88fffd6fcea15e5ba640da8 Author: manishgupt88 <tomanishgupt...@gmail.com> Date: 2016-07-19T09:25:52Z Perform equal distribution of dictionary values among the sublists of a list whenever a dictionary file is loaded into memory commit 2d4609cdface93ea3f3a7a92e088e5b98f24f7e2 Author: Venkata Ramana G <ramana.gollam...@huawei.com> Date: 2016-08-23T14:02:03Z [CARBONDATA-80] This closes #44 commit fe1b0f07deda03fe21b98191be7750bf61d8520c Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com> Date: 2016-07-20T10:32:18Z CARBONDATA-117 BlockLet distribution for optimum resource usage commit 5ebf90a87999b9dd5ec484e54aceb7487ca3096f Author: Venkata Ramana G <ramana.gollam...@huawei.com> Date: 2016-08-23T15:00:07Z [CARBONDATA-117] This closes #56 commit 61e40eb0033fca3ffc8d09d392b6090cde284652 Author: ravikiran <ravikiran.sn...@gmail.com> Date: 2016-08-23T13:58:51Z Delete the lock file once the unlocking is done. commit 64586059241589ecae6e8846ff4643ab03647041 Author: Venkata Ramana G <ramana.gollam...@huawei.com> Date: 2016-08-23T15:30:04Z [CARBONDATA-170] This closes #86 commit 897c12a031791f60a80f859093837cbd6989e84c Author: Jay357089 <liujunj...@huawei.com> Date: 2016-08-22T12:19:06Z colDict_Alldict commit c11058d7435f4176b1fee1d9fe637eb233936a6a Author: Venkata Ramana G <ramana.gollam...@huawei.com> Date: 2016-08-23T18:59:57Z [CARBONDATA-169] This closes #83 commit eac5573a644118c4942715f15e629ffa9ca1141b Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com> Date: 2016-08-23T15:17:28Z [CARBONDATA-171] Block distribution not proper when the number of active executors more than the node size commit 1a28ada21af0f0ff975c93252fdbec959974e542 Author: Venkata Ramana G <ramana.gollam...@huawei.com> Date: 2016-08-23T19:28:45Z [CARBONDATA-171] This closes #87 commit d981c0d06e0a9f0881533f87c405b4464f71019c Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-24T02:06:02Z fix review comments commit 6e4b21e5372c7b0d4c47dcd0d3366148d717526c Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-24T03:11:22Z add test case commit b59d5c77d80
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/81 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-178) table not exist when execute show segments using spark-sql and beeline the same time
zhangshunyu created CARBONDATA-178: -- Summary: table not exist when execute show segments using spark-sql and beeline the same time Key: CARBONDATA-178 URL: https://issues.apache.org/jira/browse/CARBONDATA-178 Project: CarbonData Issue Type: Bug Reporter: zhangshunyu 1 When using beeline and sparksql the same time, if create a table and load data into it by sparksql, beeline and sparksql would see the same table by show tables, but if execute show segments by beeline, it would throws exception that this table is not exists. 2. But if restart beeline or select the table before show segements, it is OK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #94: Fix bug that table not exist when exe...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/94 Fix bug that table not exist when execute show segments using spark-sql and beeline the same time. ## Why raise this pr: 1 When using beeline and sparksql the same time, if create a table and load data into it by sparksql, beeline and sparksql would see the same table by show tables, but if execute show segments by beeline, it would throws exception that this table is not exists. 2. But if restart beeline or select the table before show segements, it is OK. ## How to solve this: The problem is that beeline and sparksql using different process, they have different tableInfoMap and beeline will not get the table from his own map, althouh sparksql put his table into the tableInfoMap. So, we can use tableExists to check, checkSchemasModifiedTimeAndReloadTables in tableExists would check the "modifiedTime.mdt", if it is change by different process(here is spark sql), the other process(here is beeline) should reload the metadata firstly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata showloadbug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/94.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #94 commit 9e1a4d918e355aed17643b9d4e97bee81147b774 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-25T03:40:35Z Fix the bug that table not exist exception occured when using sparksql and beeline the same time --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #89: Fix the problem of hdfs lock and move...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/89 Fix the problem of hdfs lock and move the lock file inside the table folder. ## Why raise this pr 1. The hdfs lock file for one table should be put inside this table's store path, this is more reasonable, and if the store path is not set, then we put it into hadoop.tmp.dir. For example: if the store path of carbon on hdfs is /user/hive/warehouse/carbon.store, then the lock file for this table woud be: /user/hive/warehouse/carbon.store/default/table_name/meta.lock 2. This bug is found by : Some times, hadoop configured wrong hadoop.tmp.dir, hadoop can still work normally, but carbon's hdfs lock can not work normally, it will throws exception: "Table is locked for updation. Please try after some time". You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata hdfslock Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/89.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #89 commit 2ea2bfbef39622d7371c3afcdd2bbe5ce278bb22 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-24T06:30:50Z fix the problem of hdfs lock and move the lock file inside the table folder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/81#discussion_r75984291 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging { loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE logInfo("DataLoad failure") logger.error(ex) --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/81#discussion_r75984276 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging { loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE logInfo("DataLoad failure") logger.error(ex) + ex match { +case sparkException: SparkException => + if (sparkException.getCause.isInstanceOf[DataLoadingException]) { +executorMessage = sparkException.getCause.getMessage + } +case _ => --- End diff -- Here we only get DataLoadingException from executor and show it directly to user so that he can know his incorrect operation, but the other exception we still use "DataLoad Failure", because we do not show internal error to user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #84: [CARBONDATA-167]Fix that 'UndeclaredT...
Github user Zhangshunyu closed the pull request at: https://github.com/apache/incubator-carbondata/pull/84 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-166) create table contains shared dictionary,and the shared dictionary keywords is not complete,create table can success and load failed
zhangshunyu created CARBONDATA-166: -- Summary: create table contains shared dictionary,and the shared dictionary keywords is not complete,create table can success and load failed Key: CARBONDATA-166 URL: https://issues.apache.org/jira/browse/CARBONDATA-166 Project: CarbonData Issue Type: Bug Reporter: zhangshunyu Assignee: zhangshunyu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #70: [CARBONDATA-154] Fix the bug of block...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/70 [CARBONDATA-154] Fix the bug of block prune that query result is wrong. ## Why raise this pr: During block prune, endkey is always only decided by the last filter expression, this is a bug and can lead wrong result, For example, when load data whose dimension column is 12 lines of 'a', 12 lines of 'b', 12 lines of 'c', if query like "select * from tablename where colname='c' or colname='b' or colname='a'" only 12lines 'a' will be selected because of wrong endkey. ## How to solve this: Fix the end key consider all the filter expression end key get max and start key get (min - 1) for each column level, using this to produce a new start key an a new endkey. For more details please look at the test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata blockprune Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/70.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #70 commit d2bdb538f8b30df39ae3730aa0498a44ed934f03 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-08-10T03:09:20Z fix the bug og block prune --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-154) Block prune can not get the right blocks and query result is wrong
zhangshunyu created CARBONDATA-154: -- Summary: Block prune can not get the right blocks and query result is wrong Key: CARBONDATA-154 URL: https://issues.apache.org/jira/browse/CARBONDATA-154 Project: CarbonData Issue Type: Bug Reporter: zhangshunyu Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-123) Stored by 'carbondata' or 'org.apache.carbondata.format' shoulb be not case senstive
zhangshunyu created CARBONDATA-123: -- Summary: Stored by 'carbondata' or 'org.apache.carbondata.format' shoulb be not case senstive Key: CARBONDATA-123 URL: https://issues.apache.org/jira/browse/CARBONDATA-123 Project: CarbonData Issue Type: Bug Reporter: zhangshunyu Assignee: zhangshunyu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-104) To support varchar datatype
zhangshunyu created CARBONDATA-104: -- Summary: To support varchar datatype Key: CARBONDATA-104 URL: https://issues.apache.org/jira/browse/CARBONDATA-104 Project: CarbonData Issue Type: New Feature Reporter: zhangshunyu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #24: Correct the log info
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/24 Correct the log info You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata info Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/24.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #24 commit e03c277d1e2327a4972fe34ece0c58b316371925 Author: Zhangshunyu <zhangshu...@huawei.com> Date: 2016-07-06T03:49:41Z correct the log info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---