[jira] [Created] (CARBONDATA-3155) DataFrame support read CarbonSession/SDK written data
xubo245 created CARBONDATA-3155: --- Summary: DataFrame support read CarbonSession/SDK written data Key: CARBONDATA-3155 URL: https://issues.apache.org/jira/browse/CARBONDATA-3155 Project: CarbonData Issue Type: Improvement Reporter: xubo245 Assignee: xubo245 DataFrame support read CarbonSession/SDK written data -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9921/ ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1873/ ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1661/ ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2981 retest this please ---
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
Github user qiuchenjian commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2980#discussion_r239991055 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java --- @@ -34,8 +37,12 @@ private int numberOfColumns; public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configuration) { -String[] complexDelimiters = +String[] tempComplexDelimiters = (String[]) configuration.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS); +Queue complexDelimiters = new LinkedList<>(); +for (int i = 0; i < 4; i++) { --- End diff -- âi < 4â the 4 is not clearï¼ it's better to exchage is by constant ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1872/ ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9920/ ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1660/ ---
[GitHub] carbondata issue #2981: [CARBONDATA-3154] Fix spark-2.1 test error
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2981 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1659/ ---
[GitHub] carbondata pull request #2981: [CARBONDATA-3154] Fix spark-2.1 test error
GitHub user xubo245 opened a pull request: https://github.com/apache/carbondata/pull/2981 [CARBONDATA-3154] Fix spark-2.1 test error [CARBONDATA-3154] Fix spark-2.1 test error This PR fix spark-2.1 test error, including: 1. fix 6 errors of org.apache.spark.sql.carbondata.datasource.SparkCarbonDataSourceTest Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done fix test code - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata CARBONDATA-3154_FixSpark2_1_0TestError Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2981.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2981 commit 82ce986f500e8412c7cc515814f0fffb84b26890 Author: xubo245 Date: 2018-12-07T16:01:43Z [CARBONDATA-3154] Fix spark-2.1 test error ---
[GitHub] carbondata issue #2978: [WIP] Added lazy load and direct vector fill support...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2978 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9918/ ---
[GitHub] carbondata issue #2978: [WIP] Added lazy load and direct vector fill support...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2978 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1870/ ---
[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2980 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1869/ ---
[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2980 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9917/ ---
[GitHub] carbondata issue #2979: [CARBONDATA-3153] Complex delimiters change
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2979 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1868/ ---
[GitHub] carbondata issue #2979: [CARBONDATA-3153] Complex delimiters change
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2979 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9916/ ---
[GitHub] carbondata issue #2980: [CARBONDATA-3017] Map DDL Support
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2980 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1657/ ---
[GitHub] carbondata issue #2978: [WIP] Added lazy load and direct vector fill support...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2978 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1658/ ---
[jira] [Created] (CARBONDATA-3154) Fix spark-2.1 test error
xubo245 created CARBONDATA-3154: --- Summary: Fix spark-2.1 test error Key: CARBONDATA-3154 URL: https://issues.apache.org/jira/browse/CARBONDATA-3154 Project: CarbonData Issue Type: Bug Affects Versions: 1.5.1 Reporter: xubo245 Assignee: xubo245 Fix For: 1.5.2 Now the CI didn't run Spark2.1 UT, only compile, so when test spark2.1, there are some errors. for example: command: {code:java} -Pspark-2.1 clean install {code} error1: {code:java} 2018-12-07 21:47:20 INFO HiveMetaStore:746 - 0: get_database: global_temp 2018-12-07 21:47:20 INFO audit:371 - ugi=xubo ip=unknown-ip-addr cmd=get_database: global_temp 2018-12-07 21:47:20 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException *** RUN ABORTED *** org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'location' expecting {, '(', '.', 'SELECT', 'FROM', 'AS', 'WITH', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE', 'OPTIONS', 'CLUSTERED', 'PARTITIONED'}(line 1, pos 150) == SQL == create table par_table(male boolean, age int, height double, name string, address string,salary long, floatField float, bytefield byte) using parquet location '/Users/xubo/Desktop/xubo/git/carbondata1/integration/spark-datasource/target/warehouse2' --^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) at org.apache.spark.sql.carbondata.datasource.SparkCarbonDataSourceTest.createParquetTable(SparkCarbonDataSourceTest.scala:1126) at org.apache.spark.sql.carbondata.datasource.SparkCarbonDataSourceTest.beforeAll(SparkCarbonDataSourceTest.scala:1359) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.spark.sql.carbondata.datasource.SparkCarbonDataSourceTest.beforeAll(SparkCarbonDataSourceTest.scala:38) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) ... [INFO] [INFO] Reactor Summary: {code} there are another 5 errors in org.apache.spark.sql.carbondata.datasource.SparkCarbonDataSourceTest -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2980: [CARBONDATA-3017] Map DDL Support
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2980 [CARBONDATA-3017] Map DDL Support Support Create DDL for Map type. This PR is dependant on PR#2979 for the change of delimiters. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata MapDDL5Dec Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2980.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2980 commit 322b52a64c840317a6905664e8de16327e3635e0 Author: Manish Nalla Date: 2018-10-16T09:48:08Z MapDDLSupport commit 3d119888a80e7d8f9cab59e477984b56af1309f6 Author: manishnalla1994 Date: 2018-12-07T08:18:31Z Added Testcases and Local Dict Support commit 5fe06801360fc04bab9c1239ea8d007f37bc69d4 Author: manishnalla1994 Date: 2018-12-07T13:28:54Z Test Files for Map commit 4cc8ba13b234a13b9a3cef541e37f492153e7d1b Author: manishnalla1994 Date: 2018-12-07T14:44:12Z Changed TestCases and Supported 2 new delimiters ---
[GitHub] carbondata issue #2979: [CARBONDATA-3153] Complex delimiters change
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2979 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1656/ ---
[GitHub] carbondata pull request #2979: [CARBONDATA-3153] Complex delimiters change
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2979 [CARBONDATA-3153] Complex delimiters change Changed the two Complex Delimiters used to '\001' and '\002'. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata ComplexDelimiters Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2979.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2979 commit 7cfa05fbf65b5b176fe94ce6c36e4deb10a2a437 Author: manishnalla1994 Date: 2018-12-07T09:25:58Z Delimiters changed commit bcf265316627f49862a994292bb37169afe40403 Author: manishnalla1994 Date: 2018-12-07T13:46:57Z Change of 2 complex delimiters ---
[jira] [Created] (CARBONDATA-3153) Change of Complex Delimiters
MANISH NALLA created CARBONDATA-3153: Summary: Change of Complex Delimiters Key: CARBONDATA-3153 URL: https://issues.apache.org/jira/browse/CARBONDATA-3153 Project: CarbonData Issue Type: Bug Reporter: MANISH NALLA Assignee: MANISH NALLA -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3005) Supporting Gzip as Column Compressor
[ https://issues.apache.org/jira/browse/CARBONDATA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shardul Singh updated CARBONDATA-3005: -- Summary: Supporting Gzip as Column Compressor (was: Proposing Gzip Compression support) > Supporting Gzip as Column Compressor > > > Key: CARBONDATA-3005 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3005 > Project: CarbonData > Issue Type: New Feature >Reporter: Shardul Singh >Assignee: Shardul Singh >Priority: Minor > > Currently CarbonData uses Snappy as default codec to compress its columnar > file, Other than SNAPPY carbondata supports zstd. This Issue is targeted to > support: > 1. Gzip compression codec. > Benefits of Gzip are : > # Gzip offers reduced file size compared to other codec like snappy but at > the cost of processing speed. > # Gzip is suitable for users who have cold data i.e. data which which are > stored permanently and will be queried rarely. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1867/ ---
[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2977 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9915/ ---
[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...
Github user qiuchenjian commented on the issue: https://github.com/apache/carbondata/pull/2975 @dhatchayani I read the DimensionRawColumnChunk, this class has cache the decoded DimensionColumnPage, is your Map added useful? public DimensionColumnPage decodeColumnPage(int pageNumber) { assert pageNumber < pagesCount; if (dataChunks == null) { dataChunks = new DimensionColumnPage[pagesCount]; } if (dataChunks[pageNumber] == null) { try { dataChunks[pageNumber] = chunkReader.decodeColumnPage(this, pageNumber, null); } catch (IOException | MemoryException e) { throw new RuntimeException(e); } } return dataChunks[pageNumber]; } ---
[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/2975 @qiuchenjian Actually, DimensionColumnPage is decoded from DimensionRawColumnChunk. So it is meaningless to cache it in DimensionRawColumnChunk. Both serves different purposes. ---
[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1655/ ---
[GitHub] carbondata issue #2975: [WIP][CARBONDATA-3145] Read improvement for complex ...
Github user qiuchenjian commented on the issue: https://github.com/apache/carbondata/pull/2975 Are the decoded DimensionColumnPage are cached in DimensionRawColumnChunk, so the function of cache is used by other code. Such as, DimensionRawColumnChunk contains a Map cacheDimPages, when a Page is decoded, it will cache in this map. ---
[GitHub] carbondata issue #2977: [WIP] [CARBONDATA-3147] Fixed concurrent load issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2977 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1654/ ---