[jira] [Created] (CARBONDATA-402) carbon should support CreateAsSelect
Jay created CARBONDATA-402: -- Summary: carbon should support CreateAsSelect Key: CARBONDATA-402 URL: https://issues.apache.org/jira/browse/CARBONDATA-402 Project: CarbonData Issue Type: Improvement Reporter: Jay Priority: Minor provide support for CreateAsSelect, the syntax is hive Syntax, like below: Create TABLE table4 STORED BY 'carbondata' AS SELECT * FROM table3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #309: [WIP]support CreateAsSelect
GitHub user Jay357089 opened a pull request: https://github.com/apache/incubator-carbondata/pull/309 [WIP]support CreateAsSelect Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/Jay357089/incubator-carbondata CreateAsSelect Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/309.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #309 commit f0edae8d1a6d68c3d8020789753ff7cc91ec3179 Author: Jay357089 Date: 2016-11-10T07:01:37Z support CreateAsSelect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
GC problem and performance refine problem
Hi, We are using carbondata to build our table and running query in CarbonContext. We have some performance problem during refining the system. Background: cluster: 100 executor,5 task/executor, 10G memory/executor data:60+GB(per one replica) as carbon data format, 600+MB/file * 100 file, 300+columns, 300+million rows sql example: select A, sum(a), sum(b), sum(c), …( extra 100 aggregation like sum(column)) from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A target query time: <10s current query time: 15s ~ 25s scene: OLAP system. <100 queries every day. Concurrency number is not high. Most time cpu is idle, so this service will run with other program. The service will run for long time. We could not occupy a very large memory for every executor. refine: I have build index and dictionary on d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a, b, c, …100+ columns). And make sure there is one segment for total data. I have open the speculation(quantile=0.5, interval=250, multiplier=1.2). Time is mainly spent on first stage before shuffling. As 95% data will be filtered out, the shuffle process spend little time. In first stage, most task complete in less than 10s. But there still be near 50 tasks longer than 10s. Max task time for a query may be 12~16s. Problem: 1. GC problem. We suffer a 20%~30% GC time for some task in first stage after a lot of parameter refinement. We now use G1 GC in java8. GC time will double if use CMS. The main GC time is spent on young generation GC. Almost half memory of young generation will be copy to old generation. It seems lots of object has a long life than GC period and the space is not be reuse(as concurrent GC will release it later). When we use a large Eden(>=1G for example), once GC time will be seconds. If set Eden little(256M for example), once GC time will be hundreds milliseconds, but more frequency and total is still seconds. Is there any way to lessen the GC time? (We don’t consider the first query and second query in this case.) 2. Performance refine problem. Row number after being filtered is not uniform. Some node maybe heavy. It spend more time than other node. The time of one task is 4s ~ 16s. Is any method to refine it? 3. Too long time for first and second query. I know dictionary and some index need to be loaded for the first time. But after I trying use query below to preheat it, it still spend a lot of time. How could I preheat the query correctly? select Aarray, a, b, c… from Table1 where Aarray is not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55 4. Any other suggestion to lessen the query time? Some suggestion: The log by class QueryStatisticsRecorder give me a good means to find the neck bottle, but not enough. There still some metric I think is very useful: 1. filter ratio. i.e.. not only result_size but also the origin size so we could know how many data is filtered. 2. IO time. The scan_blocks_time is not enough. If it is high, we know somethings wrong, but not know what cause that problem. The real IO time for data is not be provided. As there may be several file for one partition, know the program slow is caused by datanode or executor itself give us intuition to find the problem. 3. The TableBlockInfo for task. I log it by myself when debugging. It tell me how many blocklets is locality. The spark web monitor just give a locality level, but may be only one blocklet is locality.
Re: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 Regards, Aniket On 9 Nov 2016 3:17 p.m., "Liang Chen" wrote: > Hi all, > > I submit the CarbonData 0.2.0-incubating to your vote. > > Release Notes: > https://issues.apache.org/jira/secure/ReleaseNote.jspa? > projectId=12320220&version=12337896 > > Staging Repository: > https://repository.apache.org/content/repositories/ > orgapachecarbondata-1006 > > Git Tag: > carbondata-0.2.0-incubating > > Please vote to approve this release: > [ ] +1 Approve the release > [ ] -1 Don't approve the release (please provide specific comments) > > This vote will be open for at least 72 hours. If this vote passes (we need > at least 3 binding votes, meaning three votes from the PPMC), I will > forward to gene...@incubator.apache.org for the IPMC votes. > > Here is my vote : +1 (binding) > > Regards > Liang >
[jira] [Created] (CARBONDATA-401) Look forward to support reading csv file only once in data loading
Lionx created CARBONDATA-401: Summary: Look forward to support reading csv file only once in data loading Key: CARBONDATA-401 URL: https://issues.apache.org/jira/browse/CARBONDATA-401 Project: CarbonData Issue Type: Improvement Reporter: Lionx Assignee: Lionx Now, In Carbon data loading module, generating global dictionary is independent. Carbon read the csv file twice for generating global dictionary and loading carbon data, respectively. We look forward to read the csv file only once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 -Vimal On Nov 10, 2016 4:47 AM, "Liang Chen" wrote: > Hi all, > > I submit the CarbonData 0.2.0-incubating to your vote. > > Release Notes: > https://issues.apache.org/jira/secure/ReleaseNote.jspa? > projectId=12320220&version=12337896 > > Staging Repository: > https://repository.apache.org/content/repositories/ > orgapachecarbondata-1006 > > Git Tag: > carbondata-0.2.0-incubating > > Please vote to approve this release: > [ ] +1 Approve the release > [ ] -1 Don't approve the release (please provide specific comments) > > This vote will be open for at least 72 hours. If this vote passes (we need > at least 3 binding votes, meaning three votes from the PPMC), I will > forward to gene...@incubator.apache.org for the IPMC votes. > > Here is my vote : +1 (binding) > > Regards > Liang >
Re: RE: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 -Regards Kumar Vishal On Nov 10, 2016 07:48, "Ravindra Pesala" wrote: > +1 > > On Thu, Nov 10, 2016, 7:07 AM Jay <2550062...@qq.com> wrote: > > > +1 > > > > > > Regards > > Jay > > > > > > -- 原始邮件 -- > > 发件人: "Jihong Ma";; > > 发送时间: 2016年11月10日(星期四) 上午7:58 > > 收件人: "dev@carbondata.incubator.apache.org"< > > dev@carbondata.incubator.apache.org>; "chenliang...@apache.org"< > > chenliang...@apache.org>; > > > > 主题: RE: [VOTE] Apache CarbonData 0.2.0-incubating release > > > > > > > > +1 binding. > > > > Jihong > > > > -Original Message- > > From: Liang Chen [mailto:chenliang6...@gmail.com] > > Sent: Wednesday, November 09, 2016 3:18 PM > > To: dev@carbondata.incubator.apache.org > > Subject: [VOTE] Apache CarbonData 0.2.0-incubating release > > > > Hi all, > > > > I submit the CarbonData 0.2.0-incubating to your vote. > > > > Release Notes: > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa? > projectId=12320220&version=12337896 > > > > Staging Repository: > > https://repository.apache.org/content/repositories/ > orgapachecarbondata-1006 > > > > Git Tag: > > carbondata-0.2.0-incubating > > > > Please vote to approve this release: > > [ ] +1 Approve the release > > [ ] -1 Don't approve the release (please provide specific comments) > > > > This vote will be open for at least 72 hours. If this vote passes (we > need > > at least 3 binding votes, meaning three votes from the PPMC), I will > > forward to gene...@incubator.apache.org for the IPMC votes. > > > > Here is my vote : +1 (binding) > > > > Regards > > Liang >
Re: RE: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 On Thu, Nov 10, 2016, 7:07 AM Jay <2550062...@qq.com> wrote: > +1 > > > Regards > Jay > > > -- 原始邮件 -- > 发件人: "Jihong Ma";; > 发送时间: 2016年11月10日(星期四) 上午7:58 > 收件人: "dev@carbondata.incubator.apache.org"< > dev@carbondata.incubator.apache.org>; "chenliang...@apache.org"< > chenliang...@apache.org>; > > 主题: RE: [VOTE] Apache CarbonData 0.2.0-incubating release > > > > +1 binding. > > Jihong > > -Original Message- > From: Liang Chen [mailto:chenliang6...@gmail.com] > Sent: Wednesday, November 09, 2016 3:18 PM > To: dev@carbondata.incubator.apache.org > Subject: [VOTE] Apache CarbonData 0.2.0-incubating release > > Hi all, > > I submit the CarbonData 0.2.0-incubating to your vote. > > Release Notes: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896 > > Staging Repository: > https://repository.apache.org/content/repositories/orgapachecarbondata-1006 > > Git Tag: > carbondata-0.2.0-incubating > > Please vote to approve this release: > [ ] +1 Approve the release > [ ] -1 Don't approve the release (please provide specific comments) > > This vote will be open for at least 72 hours. If this vote passes (we need > at least 3 binding votes, meaning three votes from the PPMC), I will > forward to gene...@incubator.apache.org for the IPMC votes. > > Here is my vote : +1 (binding) > > Regards > Liang
??????RE: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 Regards Jay -- -- ??: "Jihong Ma";; : 2016??11??10??(??) 7:58 ??: "dev@carbondata.incubator.apache.org"; "chenliang...@apache.org"; : RE: [VOTE] Apache CarbonData 0.2.0-incubating release +1 binding. Jihong -Original Message- From: Liang Chen [mailto:chenliang6...@gmail.com] Sent: Wednesday, November 09, 2016 3:18 PM To: dev@carbondata.incubator.apache.org Subject: [VOTE] Apache CarbonData 0.2.0-incubating release Hi all, I submit the CarbonData 0.2.0-incubating to your vote. Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896 Staging Repository: https://repository.apache.org/content/repositories/orgapachecarbondata-1006 Git Tag: carbondata-0.2.0-incubating Please vote to approve this release: [ ] +1 Approve the release [ ] -1 Don't approve the release (please provide specific comments) This vote will be open for at least 72 hours. If this vote passes (we need at least 3 binding votes, meaning three votes from the PPMC), I will forward to gene...@incubator.apache.org for the IPMC votes. Here is my vote : +1 (binding) Regards Liang
Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:
+1 regards Gin 发自我的 iPhone > 在 2016年11月10日,上午3:25,Kumar Vishal 写道: > > +1 > -Redards > Kumar Vishal > >> On Nov 9, 2016 08:04, "Jacky Li" wrote: >> >> +1 >> >> Regards, >> Jacky >> >>> 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道: >>> >>> +1 >>> regards >>> Jay >>> >>> >>> >>> >>> -- 原始邮件 -- >>> 发件人: "向志强";; >>> 发送时间: 2016年11月9日(星期三) 上午8:59 >>> 收件人: "dev"; >>> >>> 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0 release: >>> >>> >>> >>> No need to install thrift for building project is so great. >>> >>> 2016-11-08 23:16 GMT+08:00 QiangCai : >>> I look forward to release this version. Carbondata improved query and load performance. And it is a good news no need to install thrift for building project. Btw, How many PR merged into this version? -- View this message in context: http://apache-carbondata- mailing-list-archive.1130556.n5.nabble.com/As-planed-we- are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html Sent from the Apache CarbonData Mailing List archive mailing list >> archive at Nabble.com. >
RE: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 binding. Jihong -Original Message- From: Liang Chen [mailto:chenliang6...@gmail.com] Sent: Wednesday, November 09, 2016 3:18 PM To: dev@carbondata.incubator.apache.org Subject: [VOTE] Apache CarbonData 0.2.0-incubating release Hi all, I submit the CarbonData 0.2.0-incubating to your vote. Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896 Staging Repository: https://repository.apache.org/content/repositories/orgapachecarbondata-1006 Git Tag: carbondata-0.2.0-incubating Please vote to approve this release: [ ] +1 Approve the release [ ] -1 Don't approve the release (please provide specific comments) This vote will be open for at least 72 hours. If this vote passes (we need at least 3 binding votes, meaning three votes from the PPMC), I will forward to gene...@incubator.apache.org for the IPMC votes. Here is my vote : +1 (binding) Regards Liang
[VOTE] Apache CarbonData 0.2.0-incubating release
Hi all, I submit the CarbonData 0.2.0-incubating to your vote. Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896 Staging Repository: https://repository.apache.org/content/repositories/orgapachecarbondata-1006 Git Tag: carbondata-0.2.0-incubating Please vote to approve this release: [ ] +1 Approve the release [ ] -1 Don't approve the release (please provide specific comments) This vote will be open for at least 72 hours. If this vote passes (we need at least 3 binding votes, meaning three votes from the PPMC), I will forward to gene...@incubator.apache.org for the IPMC votes. Here is my vote : +1 (binding) Regards Liang
Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:
+1 -Redards Kumar Vishal On Nov 9, 2016 08:04, "Jacky Li" wrote: > +1 > > Regards, > Jacky > > > 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道: > > > > +1 > > regards > > Jay > > > > > > > > > > -- 原始邮件 -- > > 发件人: "向志强";; > > 发送时间: 2016年11月9日(星期三) 上午8:59 > > 收件人: "dev"; > > > > 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0 release: > > > > > > > > No need to install thrift for building project is so great. > > > > 2016-11-08 23:16 GMT+08:00 QiangCai : > > > >> I look forward to release this version. > >> Carbondata improved query and load performance. And it is a good news no > >> need to install thrift for building project. > >> Btw, How many PR merged into this version? > >> > >> > >> > >> -- > >> View this message in context: http://apache-carbondata- > >> mailing-list-archive.1130556.n5.nabble.com/As-planed-we- > >> are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html > >> Sent from the Apache CarbonData Mailing List archive mailing list > archive > >> at Nabble.com. > > > >
[DISCUSS] Improve Statistics and Profiling support
Hi, In Carbondata currently LOG4J level "STATISTICS" is available to log. How ever information is incomplete to debug performance problems and it is not easy to see statistics and profiling information of one query at one place. So we need to relook and improve statistics and profiling. I have put some pointers and can discuss regarding the same. What to collect --- 1) Statistics of table/columns like no of files, no of blocks,no of blocklets 2) Profiling information required to debug peformance issue and resource utilization. scan statistics like row size,no of block or blocklets scanned, distribution info, scan buffer size. I/O and CPU/compute cost. driver index effectiveness: number of blocks hit executor index effectiveness: number of blocklet hit decoding and decompression cost and memory required. Cache statistics , hits, misses, memory occpied. Dictionary statistics: no of entries, dictionary load time, memory occupied. Btree statistics: no of entries, Btree load time, lookup cost, memory occupied. 3) Data load: load time, memory requried, encode, compress cost. 4) Spark time and Shuffle cost. How to collect: --- Check if can be plugin to spark metrics/counters system. Have decorator statistics RDD in between to get each rdd, to collect statistics or any method to get from spark. make it plug-able to integrate with other processing frameworks, so that we can get end 2 end statistics. Some thing like log4J with clean interfaces to put and retrieve information. Where to store: --- In separate table In logs History information , like it is stored in spark(may be json). Is spark history statistics logging separate to use across frameworks? Collector can collect statistics and can decide where to store. How to see: --- Command to retrieve various statistics and profiling info Connecting to other metrics displays like spark UI or ganglia. Links: -- Profiling support in impala. http://www.cloudera.com/ documentation/enterprise/5-7-x/topics/impala_explain_plan.html#perf_profile Table and column statistics in impala. http://www.cloudera.com/ documentation/enterprise/5-8-x/topics/impala_perf_stats. html#perf_table_stats spark metrics collection http://spark.apache.org/docs/ latest/monitoring.html#metrics Regards, Venkata Ramana Gollamudi
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87218055 --- Diff: examples/src/main/scala/org/apache/carbondata/examples/CarbonExample1.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.examples.util.ExampleUtils + +object CarbonExample1 { --- End diff -- This is wrongly added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87217857 --- Diff: conf/ss.txt --- @@ -0,0 +1,122 @@ + +Release Notes - CarbonData - Version 0.1.0-incubating --- End diff -- why this is added? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception
MAKAMRAGHUVARDHAN created CARBONDATA-400: Summary: [Bad Records] Load data is fail and displaying the string value in beeline as exception Key: CARBONDATA-400 URL: https://issues.apache.org/jira/browse/CARBONDATA-400 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 0.1.0-incubating Environment: 3node cluster Reporter: MAKAMRAGHUVARDHAN Priority: Minor Steps 1. Create table CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format'; 2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary. LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col'); Actual Result: Load data is failed and displaying the string value in beeline as exception trace. Expected Result:Should display a valid exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-399) [Bad Records] Data Load is not FAILED even bad_records_action="FAIL" .
Babulal created CARBONDATA-399: -- Summary: [Bad Records] Data Load is not FAILED even bad_records_action="FAIL" . Key: CARBONDATA-399 URL: https://issues.apache.org/jira/browse/CARBONDATA-399 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 0.1.0-incubating Environment: SUSE 11 SP4 YARN HA 3 Nodes Reporter: Babulal Priority: Minor Data Load is not FAILED when string data are loaded in the int column . 1. Create table defect_5 (imei string ,deviceInformationId int,mac string,productdate timestamp,updatetime timestamp,gamePointId double,contractNumber double) stored by 'carbondata' TBLPROPERTIES('DICTIONARY_INCLUDE'='deviceInformationId') ; deviceInformationId is int ( it will handled as dimension). Now load the data 2. 0: jdbc:hive2://ha-cluster/default> LOAD DATA inpath 'hdfs://hacluster/tmp/100_default_date_11_header_2.csv' into table defect_5 options('DELIMITER'=',', 'bad_records_action'='FAIL', 'QUOTECHAR'='"','FILEHEADER'='imei,deviceinformationid,mac,productdate,updatetime,gamepointid,contractnumber'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.969 seconds) 3. Data imei,deviceinformationid,mac,productdate,updatetime,gamepointid,contractnumber 1AA1,babu,Mikaa1,2015-01-01 11:00:00,2015-01-01 13:00:00,10,260 1AA2,3,Mikaa2,2015-01-02 12:00:00,2015-01-01 14:00:00,278,230 1AA3,1,Mikaa1,2015-01-03 13:00:00,2015-01-01 15:00:00,2556,1 1AA4,10,Mikaa2,2015-01-04 14:00:00,2015-01-01 16:00:00,640,254 1AA5,10,Mikaa,2015-01-05 15:00:00,2015-01-01 17:00:00,980,256 1AA6,10,Mikaa,2015-01-06 16:00:00,2015-01-01 18:00:00,1,2378 1AA7,10,Mikaa,2015-01-07 17:00:00,2015-01-01 19:00:00,96,234 1AA8,9,max,2015-01-08 18:00:00,2015-01-01 20:00:00,89,236 1AA9,10,max,2015-01-09 19:00:00,2015-01-01 21:00:00,198.36,239.2 Expect Outoput:- Data Load should FAIL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #308: [CARBONDATA-398] In DropCarbonTable ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/308#discussion_r87191882 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1284,14 +1284,18 @@ private[sql] case class DropTableCommand(ifExistsSet: Boolean, databaseNameOp: O .getCarbonLockObj(carbonTableIdentifier, LockUsage.DROP_TABLE_LOCK) val storePath = CarbonEnv.getInstance(sqlContext).carbonCatalog.storePath var isLocked = false +val tmpTable = org.apache.carbondata.core.carbon.metadata.CarbonMetadata.getInstance() + .getCarbonTable(dbName + '_' + tableName) try { - isLocked = carbonLock.lockWithRetries() - if (isLocked) { -logInfo("Successfully able to get the lock for drop.") - } - else { -LOGGER.audit(s"Dropping table $dbName.$tableName failed as the Table is locked") -sys.error("Table is locked for deletion. Please try after some time") + if (null != tmpTable) { --- End diff -- It is better to use `CarbonEnv.getInstance(sqlContext).carbonCatalog.tableExists` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-398) In DropCarbonTable flow, Metadata lock should be acquired only if table exist
Naresh P R created CARBONDATA-398: - Summary: In DropCarbonTable flow, Metadata lock should be acquired only if table exist Key: CARBONDATA-398 URL: https://issues.apache.org/jira/browse/CARBONDATA-398 Project: CarbonData Issue Type: Bug Reporter: Naresh P R Assignee: Naresh P R Priority: Trivial Issue : In drop table flow, we acquiring metadata lock even if table not exist which is creating table folder and creating meta.lock file. Solution : Check and acquire lock only if carbon table exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #308: [WIP] In DropCarbonTable flow, Metad...
GitHub user nareshpr opened a pull request: https://github.com/apache/incubator-carbondata/pull/308 [WIP] In DropCarbonTable flow, Metadata lock should be acquired only if tab⦠Issue : In drop table flow, we acquiring metadata lock even if table not exist which is creating table folder and creating meta.lock file. Solution : Check and acquire lock only if carbon table exist. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nareshpr/incubator-carbondata droptablefix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #308 commit 20813ccbffae1aa48bdef069721aeeebc550bc04 Author: nareshpr Date: 2016-11-09T12:43:36Z In DropCarbonTable flow, Metadata lock should be acquired only if table exist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #207: [CARBONDATA-283] VT enhancement for ...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/207 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-397) Use of ANTLR instead of CarbonSqlParser for parsing queries
Anurag Srivastava created CARBONDATA-397: Summary: Use of ANTLR instead of CarbonSqlParser for parsing queries Key: CARBONDATA-397 URL: https://issues.apache.org/jira/browse/CARBONDATA-397 Project: CarbonData Issue Type: Improvement Reporter: Anurag Srivastava Priority: Minor We are using CarbonSqlParser for parsing queries but we can use ANTLR for the same. we could better handle query parsing with ANTLR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #307: [Carbondata-396] Implement test case...
GitHub user harmeetsingh0013 opened a pull request: https://github.com/apache/incubator-carbondata/pull/307 [Carbondata-396] Implement test cases for datastorage package Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/harmeetsingh0013/incubator-carbondata CARBONDATA-396 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/307.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #307 commit f007240376a11a9f2e1e172cc2bffd4b1ad4340a Author: harmeetsingh0013 Date: 2016-11-03T12:09:37Z Write unit test cases for ColumnDictionaryInfo commit 3802bbf528cb3deceef15cdd1bb4e48073a4570f Author: harmeetsingh0013 Date: 2016-11-03T12:38:13Z Add apache license in javadocs commit f76594dc13b8c4a690aa931b9c4d6982c23615c3 Author: harmeetsingh0013 Date: 2016-11-04T05:26:51Z Merge branch 'master' of github.com:apache/incubator-carbondata into CARBONDATA-371 commit 9f672e56a4a96998a556ac454811393edf216562 Author: harmeetsingh0013 Date: 2016-11-08T05:45:13Z Merge branch 'master' of github.com:apache/incubator-carbondata into CARBONDATA-371 commit 0413e2be90fbe7d413efddaacc9b814295cee14f Author: harmeetsingh0013 Date: 2016-11-09T09:36:34Z Write unit test case for BlockIndexerStorageForInt UnBlockIndexerTest ExcludeColGroupFilterExecuterImpl classes commit 69fe75461caf2bceda578d6b655ddf702fe943aa Author: harmeetsingh0013 Date: 2016-11-09T09:37:26Z Write unit test case for BlockIndexerStorageForInt UnBlockIndexerTest ExcludeColGroupFilterExecuterImpl classes commit a131b544e765c05efc97dbe641ad15391f436de2 Author: harmeetsingh0013 Date: 2016-11-09T09:37:42Z Merge branch 'master' of github.com:apache/incubator-carbondata into CARBONDATA-396 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87172561 --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java --- @@ -215,6 +227,105 @@ public static void executeGraph(CarbonLoadModel loadModel, String storeLocation, info, loadModel.getPartitionId(), loadModel.getCarbonDataLoadSchema()); } + public static void executeNewDataLoad(CarbonLoadModel loadModel, String storeLocation, + String hdfsStoreLocation, RecordReader[] recordReaders) + throws Exception { +if (!new File(storeLocation).mkdirs()) { + LOGGER.error("Error while creating the temp store path: " + storeLocation); +} +CarbonDataLoadConfiguration configuration = new CarbonDataLoadConfiguration(); +String databaseName = loadModel.getDatabaseName(); +String tableName = loadModel.getTableName(); +String tempLocationKey = databaseName + CarbonCommonConstants.UNDERSCORE + tableName ++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo(); +CarbonProperties.getInstance().addProperty(tempLocationKey, storeLocation); +CarbonProperties.getInstance() +.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, hdfsStoreLocation); +// CarbonProperties.getInstance().addProperty("store_output_location", outPutLoc); +CarbonProperties.getInstance().addProperty("send.signal.load", "false"); + +CarbonTable carbonTable = loadModel.getCarbonDataLoadSchema().getCarbonTable(); +AbsoluteTableIdentifier identifier = +carbonTable.getAbsoluteTableIdentifier(); +configuration.setTableIdentifier(identifier); +String csvHeader = loadModel.getCsvHeader(); +String csvFileName = null; +if (csvHeader != null && !csvHeader.isEmpty()) { + configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, ",")); +} else { + CarbonFile csvFile = + CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0)); + csvFileName = csvFile.getName(); + csvHeader = CarbonDataProcessorUtil.getFileHeader(csvFile); + configuration.setHeader( + CarbonDataProcessorUtil.getColumnFields(csvHeader, loadModel.getCsvDelimiter())); +} +CarbonDataProcessorUtil +.validateHeader(loadModel.getTableName(), csvHeader, loadModel.getCarbonDataLoadSchema(), +loadModel.getCsvDelimiter(), csvFileName); + +configuration.setPartitionId(loadModel.getPartitionId()); +configuration.setSegmentId(loadModel.getSegmentId()); +configuration.setTaskNo(loadModel.getTaskNo()); + configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS, +new String[] { loadModel.getComplexDelimiterLevel1(), +loadModel.getComplexDelimiterLevel2() }); + configuration.setDataLoadProperty(DataLoadProcessorConstants.SERIALIZATION_NULL_FORMAT, +loadModel.getSerializationNullFormat().split(",")[1]); + configuration.setDataLoadProperty(DataLoadProcessorConstants.FACT_TIME_STAMP, +loadModel.getFactTimeStamp()); + configuration.setDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ENABLE, +loadModel.getBadRecordsLoggerEnable().split(",")[1]); + configuration.setDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ACTION, +loadModel.getBadRecordsAction().split(",")[1]); + configuration.setDataLoadProperty(DataLoadProcessorConstants.FACT_FILE_PATH, +loadModel.getFactFilePath()); +List dimensions = + carbonTable.getDimensionByTableName(carbonTable.getFactTableName()); +List measures = +carbonTable.getMeasureByTableName(carbonTable.getFactTableName()); +Map dateFormatMap = + CarbonDataProcessorUtil.getDateFormatMap(loadModel.getDateFormat()); +List dataFields = new ArrayList<>(); +List complexDataFields = new ArrayList<>(); + +// First add dictionary and non dictionary dimensions because these are part of mdk key. +// And then add complex data types and measures. +for (CarbonColumn column : dimensions) { + DataField dataField = new DataField(column); + dataField.setDateFormat(dateFormatMap.get(column.getColName())); + if (column.isComplex()) { +complexDataFields.add(dataField); + } else { +dataFields.add(dataField); + } +} +dataFields.addAll(complexDataFields); +for (CarbonColumn column : measures) { + // This dummy measure is added when no measure was present. We no need to load it. + if (!(column.getColNam
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171862 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java --- @@ -43,35 +45,58 @@ private CarbonDataLoadConfiguration configuration; + private DataField[] fields; + private FieldConverter[] fieldConverters; - public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration configuration) { + private BadRecordslogger badRecordLogger; + + private BadRecordLogHolder logHolder; + + public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration configuration, + BadRecordslogger badRecordLogger) { +this.fields = fields; this.configuration = configuration; +this.badRecordLogger = badRecordLogger; + } + + @Override + public void initialize() { CacheProvider cacheProvider = CacheProvider.getInstance(); Cache cache = cacheProvider.createCache(CacheType.REVERSE_DICTIONARY, configuration.getTableIdentifier().getStorePath()); +String nullFormat = + configuration.getDataLoadProperty(DataLoadProcessorConstants.SERIALIZATION_NULL_FORMAT) +.toString(); List fieldConverterList = new ArrayList<>(); long lruCacheStartTime = System.currentTimeMillis(); for (int i = 0; i < fields.length; i++) { FieldConverter fieldConverter = FieldEncoderFactory.getInstance() .createFieldEncoder(fields[i], cache, - configuration.getTableIdentifier().getCarbonTableIdentifier(), i); - if (fieldConverter != null) { -fieldConverterList.add(fieldConverter); - } + configuration.getTableIdentifier().getCarbonTableIdentifier(), i, nullFormat); + fieldConverterList.add(fieldConverter); } CarbonTimeStatisticsFactory.getLoadStatisticsInstance() .recordLruCacheLoadTime((System.currentTimeMillis() - lruCacheStartTime) / 1000.0); fieldConverters = fieldConverterList.toArray(new FieldConverter[fieldConverterList.size()]); +logHolder = new BadRecordLogHolder(); } @Override public CarbonRow convert(CarbonRow row) throws CarbonDataLoadingException { +CarbonRow copy = row.getCopy(); --- End diff -- It is required as the fieldConverters update the row and if badrecord found in last column then we don't want to give converted data logger. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171880 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java --- @@ -102,21 +102,14 @@ public void initialize(SortParameters sortParameters) { } this.executorService = Executors.newFixedThreadPool(iterators.length); -// First prepare the data for sort. -Iterator[] sortPrepIterators = new Iterator[iterators.length]; -for (int i = 0; i < sortPrepIterators.length; i++) { - sortPrepIterators[i] = new SortPreparatorIterator(iterators[i], inputDataFields); -} - -for (int i = 0; i < sortDataRows.length; i++) { - executorService - .submit(new SortIteratorThread(sortPrepIterators[i], sortDataRows[i], sortParameters)); -} - try { + for (int i = 0; i < sortDataRows.length; i++) { +executorService +.submit(new SortIteratorThread(iterators[i], sortDataRows[i], sortParameters)); --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171144 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java --- @@ -110,7 +116,11 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia initialize(); while (hasNext()) { -writeDataTofile(next()); +if (useKettle) { --- End diff -- Ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171167 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/SortProcessorStepImpl.java --- @@ -50,6 +50,7 @@ public SortProcessorStepImpl(CarbonDataLoadConfiguration configuration, @Override public void initialize() throws CarbonDataLoadingException { +super.initialize(); --- End diff -- Ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171187 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/DataConverterProcessorStepImpl.java --- @@ -47,20 +58,109 @@ public DataConverterProcessorStepImpl(CarbonDataLoadConfiguration configuration, @Override public void initialize() throws CarbonDataLoadingException { -encoder = new RowConverterImpl(child.getOutput(), configuration); -child.initialize(); +super.initialize(); +BadRecordslogger badRecordLogger = createBadRecordLogger(); +converter = new RowConverterImpl(child.getOutput(), configuration, badRecordLogger); +converter.initialize(); + } + + /** + * Create the iterator using child iterator. + * + * @param childIter + * @return new iterator with step specific processing. + */ + @Override + protected Iterator getIterator(final Iterator childIter) { +return new CarbonIterator() { + RowConverter localConverter = converter.createCopyForNewThread(); + @Override public boolean hasNext() { +return childIter.hasNext(); + } + + @Override public CarbonRowBatch next() { +return processRowBatch(childIter.next(), localConverter); + } +}; + } + + /** + * Process the batch of rows as per the step logic. + * + * @param rowBatch + * @return processed row. + */ + protected CarbonRowBatch processRowBatch(CarbonRowBatch rowBatch, RowConverter localConverter) { +CarbonRowBatch newBatch = new CarbonRowBatch(); +Iterator batchIterator = rowBatch.getBatchIterator(); +while (batchIterator.hasNext()) { + newBatch.addRow(localConverter.convert(batchIterator.next())); +} +return newBatch; } @Override protected CarbonRow processRow(CarbonRow row) { -return encoder.convert(row); +// Not implemented +return null; + } + + private BadRecordslogger createBadRecordLogger() { +boolean badRecordsLogRedirect = false; +boolean badRecordConvertNullDisable = false; +boolean badRecordsLoggerEnable = Boolean.parseBoolean( + configuration.getDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ENABLE) +.toString()); +Object bad_records_action = + configuration.getDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ACTION) +.toString(); +if (null != bad_records_action) { + LoggerAction loggerAction = null; + try { +loggerAction = LoggerAction.valueOf(bad_records_action.toString().toUpperCase()); + } catch (IllegalArgumentException e) { +loggerAction = LoggerAction.FORCE; + } + switch (loggerAction) { +case FORCE: + badRecordConvertNullDisable = false; + break; +case REDIRECT: + badRecordsLogRedirect = true; + badRecordConvertNullDisable = true; + break; +case IGNORE: + badRecordsLogRedirect = false; + badRecordConvertNullDisable = true; + break; + } +} +CarbonTableIdentifier identifier = +configuration.getTableIdentifier().getCarbonTableIdentifier(); +String key = identifier.getDatabaseName() + '/' + identifier.getTableName() + '_' + identifier +.getTableName(); +BadRecordslogger badRecordslogger = +new BadRecordslogger(key, identifier.getTableName() + '_' + System.currentTimeMillis(), +getBadLogStoreLocation( +identifier.getDatabaseName() + '/' + identifier.getTableName() + "/" + configuration --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170923 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortParameters.java --- @@ -122,6 +116,11 @@ private int numberOfCores; + /** + * Temporary conf , it will be removed after refactor. --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170873 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java --- @@ -81,13 +81,24 @@ */ private String taskKey; + private boolean badRecordsLogRedirect; + + private boolean badRecordLoggerEnable; + + private boolean badRecordConvertNullDisable; + // private final Object syncObject =new Object(); - public BadRecordslogger(String key, String fileName, String storePath) { + public BadRecordslogger(String key, String fileName, String storePath, --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170942 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java --- @@ -264,6 +277,72 @@ private void writeData(Object[][] recordHolderList, int entryCountLocal, File fi } } + private void writeDataWithOutKettle(Object[][] recordHolderList, int entryCountLocal, File file) + throws CarbonSortKeyAndGroupByException { +DataOutputStream stream = null; +try { + // open stream + stream = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file), + parameters.getFileWriteBufferSize())); + + // write number of entries to the file + stream.writeInt(entryCountLocal); + int complexDimColCount = parameters.getComplexDimColCount(); + int dimColCount = parameters.getDimColCount() + complexDimColCount; + char[] aggType = parameters.getAggType(); + boolean[] noDictionaryDimnesionMapping = parameters.getNoDictionaryDimnesionColumn(); + Object[] row = null; + for (int i = 0; i < entryCountLocal; i++) { +// get row from record holder list +row = recordHolderList[i]; +int dimCount = 0; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170892 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortTempFileChunkHolder.java --- @@ -136,6 +136,9 @@ */ private boolean[] isNoDictionaryDimensionColumn; + // temporary configuration + private boolean useKettle; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170828 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -396,4 +407,223 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB } return complexTypesMap; } + + /** + * Get the csv file to read if it the path is file otherwise get the first file of directory. + * + * @param csvFilePath + * @return File + */ + public static CarbonFile getCsvFileToRead(String csvFilePath) { +CarbonFile csvFile = +FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); + +CarbonFile[] listFiles = null; +if (csvFile.isDirectory()) { + listFiles = csvFile.listFiles(new CarbonFileFilter() { +@Override public boolean accept(CarbonFile pathname) { + if (!pathname.isDirectory()) { +if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname + .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION ++ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { + return true; +} + } + + return false; +} + }); +} else { + listFiles = new CarbonFile[1]; + listFiles[0] = csvFile; + --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170805 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -396,4 +407,223 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB } return complexTypesMap; } + + /** + * Get the csv file to read if it the path is file otherwise get the first file of directory. + * + * @param csvFilePath + * @return File + */ + public static CarbonFile getCsvFileToRead(String csvFilePath) { +CarbonFile csvFile = +FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); + +CarbonFile[] listFiles = null; +if (csvFile.isDirectory()) { + listFiles = csvFile.listFiles(new CarbonFileFilter() { +@Override public boolean accept(CarbonFile pathname) { + if (!pathname.isDirectory()) { +if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname + .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION ++ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { + return true; +} + } + --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170789 --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java --- @@ -215,6 +227,105 @@ public static void executeGraph(CarbonLoadModel loadModel, String storeLocation, info, loadModel.getPartitionId(), loadModel.getCarbonDataLoadSchema()); } + public static void executeNewDataLoad(CarbonLoadModel loadModel, String storeLocation, + String hdfsStoreLocation, RecordReader[] recordReaders) + throws Exception { +if (!new File(storeLocation).mkdirs()) { + LOGGER.error("Error while creating the temp store path: " + storeLocation); +} +CarbonDataLoadConfiguration configuration = new CarbonDataLoadConfiguration(); +String databaseName = loadModel.getDatabaseName(); +String tableName = loadModel.getTableName(); +String tempLocationKey = databaseName + CarbonCommonConstants.UNDERSCORE + tableName ++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo(); +CarbonProperties.getInstance().addProperty(tempLocationKey, storeLocation); +CarbonProperties.getInstance() +.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, hdfsStoreLocation); +// CarbonProperties.getInstance().addProperty("store_output_location", outPutLoc); +CarbonProperties.getInstance().addProperty("send.signal.load", "false"); + +CarbonTable carbonTable = loadModel.getCarbonDataLoadSchema().getCarbonTable(); +AbsoluteTableIdentifier identifier = +carbonTable.getAbsoluteTableIdentifier(); +configuration.setTableIdentifier(identifier); +String csvHeader = loadModel.getCsvHeader(); +String csvFileName = null; +if (csvHeader != null && !csvHeader.isEmpty()) { + configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, ",")); +} else { + CarbonFile csvFile = + CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0)); + csvFileName = csvFile.getName(); + csvHeader = CarbonDataProcessorUtil.getFileHeader(csvFile); + configuration.setHeader( + CarbonDataProcessorUtil.getColumnFields(csvHeader, loadModel.getCsvDelimiter())); +} +CarbonDataProcessorUtil +.validateHeader(loadModel.getTableName(), csvHeader, loadModel.getCarbonDataLoadSchema(), +loadModel.getCsvDelimiter(), csvFileName); --- End diff -- Ok. Modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170848 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -952,10 +951,9 @@ private String getCarbonLocalBaseStoreLocation() { // In that case it will have first value empty and other values will be null // So If records is coming like this then we need to write this records as a bad Record. -if (null == r[0] && badRecordConvertNullDisable) { +if (null == r[0] && badRecordslogger.isBadRecordConvertNullDisable()) { badRecordslogger - .addBadRecordsToBuilder(r, "Column Names are coming NULL", "null", - badRecordsLogRedirect, badRecordsLoggerEnable); + .addBadRecordsToBuilder(r, "Column Names are coming NULL", "null"); --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170688 --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java --- @@ -215,6 +227,105 @@ public static void executeGraph(CarbonLoadModel loadModel, String storeLocation, info, loadModel.getPartitionId(), loadModel.getCarbonDataLoadSchema()); } + public static void executeNewDataLoad(CarbonLoadModel loadModel, String storeLocation, + String hdfsStoreLocation, RecordReader[] recordReaders) + throws Exception { +if (!new File(storeLocation).mkdirs()) { + LOGGER.error("Error while creating the temp store path: " + storeLocation); +} +CarbonDataLoadConfiguration configuration = new CarbonDataLoadConfiguration(); +String databaseName = loadModel.getDatabaseName(); +String tableName = loadModel.getTableName(); +String tempLocationKey = databaseName + CarbonCommonConstants.UNDERSCORE + tableName ++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo(); +CarbonProperties.getInstance().addProperty(tempLocationKey, storeLocation); +CarbonProperties.getInstance() +.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, hdfsStoreLocation); +// CarbonProperties.getInstance().addProperty("store_output_location", outPutLoc); +CarbonProperties.getInstance().addProperty("send.signal.load", "false"); + +CarbonTable carbonTable = loadModel.getCarbonDataLoadSchema().getCarbonTable(); +AbsoluteTableIdentifier identifier = +carbonTable.getAbsoluteTableIdentifier(); +configuration.setTableIdentifier(identifier); +String csvHeader = loadModel.getCsvHeader(); +String csvFileName = null; +if (csvHeader != null && !csvHeader.isEmpty()) { + configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, ",")); +} else { + CarbonFile csvFile = + CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0)); --- End diff -- csvfile can exists inside hdfs also, thats why we are creating CarbonFile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #298: [CARBONDATA-383]Optimize hdfsStoreLo...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/298 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-396) Implement test cases for datastorage package
Anurag Srivastava created CARBONDATA-396: Summary: Implement test cases for datastorage package Key: CARBONDATA-396 URL: https://issues.apache.org/jira/browse/CARBONDATA-396 Project: CarbonData Issue Type: Test Reporter: Anurag Srivastava Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)