[GitHub] incubator-carbondata pull request #317: added test case for file system
GitHub user anubhav100 opened a pull request: https://github.com/apache/incubator-carbondata/pull/317 added test case for file system Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/anubhav100/incubator-carbondata CARBONDATA-410 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/317.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #317 commit 14ebd6ddae9621580582e1b42dc5e47457a3455c Author: anubhav100 Date: 2016-11-15T06:18:51Z added test case for file system --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-410) Implement test cases for core.datastore.file system
SWATI RAO created CARBONDATA-410: Summary: Implement test cases for core.datastore.file system Key: CARBONDATA-410 URL: https://issues.apache.org/jira/browse/CARBONDATA-410 Project: CarbonData Issue Type: Task Reporter: SWATI RAO -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[Feature] proposal for update and delete support in Carbon data
Hi All I would like to propose following new features in Carbon data 1) Update statement to support modifying existing records in carbon data table 2) Delete statement to remove records from carbon data table A) Update operation: 'Update' features can be added to CarbonData using intermediate Delta files [delete/update delta files] support with lesser impact on existing code. Update can be considered as a ‘delete’ followed by an‘insert’ operation. Once an update is done on carbon data file, on select query operation, Carbondata store reader can make use of delete delta data cache to exclude deleted records in that segment and then include records from newly added update delta files. B) Delete operation: In the case of delete operation, a delete delta file will be added to each segment matching the records. During select query operation Carbon data reader will exclude those deleted records from the result set. Please share your suggestions and thoughts about design and functional aspects on this feature. I’ll share a detailed design document about above thoughts later. Regards Vinod
[GitHub] incubator-carbondata pull request #316: [WIP]create agg table segment for ev...
Github user Jay357089 closed the pull request at: https://github.com/apache/incubator-carbondata/pull/316 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [RESULT][VOTE] Apache CarbonData 0.2.0-incubating release
Sorry for coming late on this. here is my +1 (binding) too. I will carry my +1 to incubator list as well. Regards, Uma On Sun, Nov 13, 2016 at 6:20 AM, Liang Chen wrote: > Hi > > PPMC vote has passed, the result as below: > +1(binding) : 6 > +1(non-binding) : 6 > Thanks all for your vote. > > Regards > Liang > > Liang Chen wrote > > Hi all, > > > > I submit the CarbonData 0.2.0-incubating to your vote. > > > > Release Notes: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa? > projectId=12320220&version=12337896 > > > > Staging Repository: > > https://repository.apache.org/content/repositories/ > orgapachecarbondata-1006 > > > > Git Tag: > > carbondata-0.2.0-incubating > > > > Please vote to approve this release: > > [ ] +1 Approve the release > > [ ] -1 Don't approve the release (please provide specific comments) > > > > This vote will be open for at least 72 hours. If this vote passes (we > need > > at least 3 binding votes, meaning three votes from the PPMC), I will > > forward to > > > general@.apache > > > for the IPMC votes. > > > > Here is my vote : +1 (binding) > > > > Regards > > Liang > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/VOTE-Apache- > CarbonData-0-2-0-incubating-release-tp2823p2881.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. >
RE: Single Pass Data Load Design
Hi Ravindra, Thank you for putting together a proposal for improving data load process! Please find my comments in-lined in the Google doc. Jihong -Original Message- From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] Sent: Sunday, November 13, 2016 4:24 AM To: dev Subject: Single Pass Data Load Design Hi All, Please find the proposed solutions for single pass data load. https://docs.google.com/document/d/1_sSN9lccCZo4E_X3pNP5PchQACqif3AOXKTuG-YJAcc/edit?usp=sharing -- Thanks & Regards, Ravindra
[GitHub] incubator-carbondata pull request #316: [WIP]create agg table segment for ev...
GitHub user Jay357089 opened a pull request: https://github.com/apache/incubator-carbondata/pull/316 [WIP]create agg table segment for every fact table single segment Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/Jay357089/incubator-carbondata createAggTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/316.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #316 commit 92da6a1313ef5eaa532e10d46828d0e6791a4119 Author: Jay357089 Date: 2016-11-10T07:01:37Z create agg table segment for every fact table single segment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [Vote] Please provide valuable feedback's and vote for Like filter query performance optimization
+1 On Mon, Nov 14, 2016, 3:54 PM sujith chacko wrote: > Hi liang, > Yes, its for high cardinality columns. > Thanks, > Sujith > > On Nov 14, 2016 2:01 PM, "Liang Chen" wrote: > > > Hi > > > > I have one query : for no dictionary columns which are high cardinality > > like phone number, Whether the pruning cost is hight,or not ? > > > > Regards > > Liang > > > > 2016-11-14 15:18 GMT+08:00 sujith chacko : > > > > > Hi All, > > > > > > I am going to optimize the LIKE Filter query flow for no-dictionary > > > columns, please find the details mentioned below. > > > > > > *Current design:* > > > For Like filter queries no push down is happening to carbon layer, > > because > > > of this there will be no block/blocklet level pruning which can happen > > > before applying the LIKE filters, this can add overhead while scanning > > > since the system has to scan all the blocks and blocklets in order to > > apply > > > filters. > > > > > > *Proposed design/solution:* > > > Like filters(startsWith,endsWith,contains) can be pushed to carbon > > engine > > > layer so that carbon can perform block and blocklet level pruning > inorder > > > before applying filters. > > > Block level pruning will be happening in driver side and blocklet level > > > pruning will be done in executer as per existing design. > > > > > > Requesting all to please provide valuable feedback and vote for > > > implementing the above solution inorder to improve Like Filter > Queries. > > > > > > Thanks, > > > Sujith > > > > > > > > > > > -- > > Regards > > Liang > > >
Re: [Vote] Please provide valuable feedback's and vote for Like filter query performance optimization
Hi liang, Yes, its for high cardinality columns. Thanks, Sujith On Nov 14, 2016 2:01 PM, "Liang Chen" wrote: > Hi > > I have one query : for no dictionary columns which are high cardinality > like phone number, Whether the pruning cost is hight,or not ? > > Regards > Liang > > 2016-11-14 15:18 GMT+08:00 sujith chacko : > > > Hi All, > > > > I am going to optimize the LIKE Filter query flow for no-dictionary > > columns, please find the details mentioned below. > > > > *Current design:* > > For Like filter queries no push down is happening to carbon layer, > because > > of this there will be no block/blocklet level pruning which can happen > > before applying the LIKE filters, this can add overhead while scanning > > since the system has to scan all the blocks and blocklets in order to > apply > > filters. > > > > *Proposed design/solution:* > > Like filters(startsWith,endsWith,contains) can be pushed to carbon > engine > > layer so that carbon can perform block and blocklet level pruning inorder > > before applying filters. > > Block level pruning will be happening in driver side and blocklet level > > pruning will be done in executer as per existing design. > > > > Requesting all to please provide valuable feedback and vote for > > implementing the above solution inorder to improve Like Filter Queries. > > > > Thanks, > > Sujith > > > > > > -- > Regards > Liang >
[jira] [Created] (CARBONDATA-409) Drop non-existing macro executes successfully while it must give an error.
Sangeeta Gulia created CARBONDATA-409: - Summary: Drop non-existing macro executes successfully while it must give an error. Key: CARBONDATA-409 URL: https://issues.apache.org/jira/browse/CARBONDATA-409 Project: CarbonData Issue Type: Bug Components: data-query Reporter: Sangeeta Gulia I have created a macro : CREATE TEMPORARY MACRO simple_add (x int, y int) x + y; then i dropped the macro. > drop temporary macro simple_add; OK Time taken: 0.038 seconds hive> > > select simple_add(2,3); FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'simple_add' then i again tried to drop the same macro and it again executed without any exception: > drop temporary macro simple_add; OK Time taken: 0.016 seconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [Vote] Please provide valuable feedback's and vote for Like filter query performance optimization
+1 Hi Liang, Pruning cost won't be high as block pruning will be done at complete btree level and it will improve query performance for no dictionary column. -Regards Kumar Vishal On Nov 14, 2016 14:01, "Liang Chen" wrote: > Hi > > I have one query : for no dictionary columns which are high cardinality > like phone number, Whether the pruning cost is hight,or not ? > > Regards > Liang > > 2016-11-14 15:18 GMT+08:00 sujith chacko : > > > Hi All, > > > > I am going to optimize the LIKE Filter query flow for no-dictionary > > columns, please find the details mentioned below. > > > > *Current design:* > > For Like filter queries no push down is happening to carbon layer, > because > > of this there will be no block/blocklet level pruning which can happen > > before applying the LIKE filters, this can add overhead while scanning > > since the system has to scan all the blocks and blocklets in order to > apply > > filters. > > > > *Proposed design/solution:* > > Like filters(startsWith,endsWith,contains) can be pushed to carbon > engine > > layer so that carbon can perform block and blocklet level pruning inorder > > before applying filters. > > Block level pruning will be happening in driver side and blocklet level > > pruning will be done in executer as per existing design. > > > > Requesting all to please provide valuable feedback and vote for > > implementing the above solution inorder to improve Like Filter Queries. > > > > Thanks, > > Sujith > > > > > > -- > Regards > Liang >
Re: Single Pass Data Load Design
Hi Yes, good feature. This improvement would significantly improve data load performance. Can you provide a sequence diagram for the whole data load process? Regards Liang 2016-11-14 15:42 GMT+08:00 Jacky Li : > Hi Ravindra, > > Thanks for proposing this design. It is really exciting if CarbonData can > do > 1-pass solution for loading. I have given some comment in the design > document. > > Regards, > Jacky > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Single-Pass- > Data-Load-Design-tp2875p2894.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. > -- Regards Liang
[GitHub] incubator-carbondata pull request #313: [CARBONDATA-405]Fixed Data load fail...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [Vote] Please provide valuable feedback's and vote for Like filter query performance optimization
Hi I have one query : for no dictionary columns which are high cardinality like phone number, Whether the pruning cost is hight,or not ? Regards Liang 2016-11-14 15:18 GMT+08:00 sujith chacko : > Hi All, > > I am going to optimize the LIKE Filter query flow for no-dictionary > columns, please find the details mentioned below. > > *Current design:* > For Like filter queries no push down is happening to carbon layer, because > of this there will be no block/blocklet level pruning which can happen > before applying the LIKE filters, this can add overhead while scanning > since the system has to scan all the blocks and blocklets in order to apply > filters. > > *Proposed design/solution:* > Like filters(startsWith,endsWith,contains) can be pushed to carbon engine > layer so that carbon can perform block and blocklet level pruning inorder > before applying filters. > Block level pruning will be happening in driver side and blocklet level > pruning will be done in executer as per existing design. > > Requesting all to please provide valuable feedback and vote for > implementing the above solution inorder to improve Like Filter Queries. > > Thanks, > Sujith > -- Regards Liang