[jira] [Created] (CARBONDATA-402) carbon should support CreateAsSelect

2016-11-09 Thread Jay (JIRA)
Jay created CARBONDATA-402:
--

 Summary: carbon should support CreateAsSelect 
 Key: CARBONDATA-402
 URL: https://issues.apache.org/jira/browse/CARBONDATA-402
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jay
Priority: Minor


provide support for CreateAsSelect,
the syntax is hive Syntax,  like below:
Create  TABLE table4  STORED BY 'carbondata' AS SELECT * FROM table3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #309: [WIP]support CreateAsSelect

2016-11-09 Thread Jay357089
GitHub user Jay357089 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/309

[WIP]support CreateAsSelect

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Jay357089/incubator-carbondata CreateAsSelect

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #309


commit f0edae8d1a6d68c3d8020789753ff7cc91ec3179
Author: Jay357089 
Date:   2016-11-10T07:01:37Z

support CreateAsSelect




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


GC problem and performance refine problem

2016-11-09 Thread Anning Luo
Hi,

We are using carbondata to build our table and running query in CarbonContext. 
We have some performance problem during refining the system.

Background:
cluster:  100 executor,5 task/executor, 10G memory/executor
data:60+GB(per one replica) as carbon data 
format, 600+MB/file * 100 file, 300+columns, 300+million rows
sql example:
select A,

  sum(a),

  sum(b),

  sum(c),

  …( extra 100 aggregation like 
sum(column))

   from Table1 LATERAL VIEW explode(split(Aarray, ‘;’)) ATable AS A

   where A is not null and d > “ab:c-10” and d < “h:0f3s” and e!=10 and 
f=22 and g=33 and h=44 GROUP BY A
target query time:   <10s
current query time: 15s ~ 25s
scene: OLAP system. <100 queries every day. 
Concurrency number is not high. Most time cpu is idle, so this service will run 
with other program. The service will run for long time. We could not occupy a 
very large memory for every executor.
refine:  I have build index and dictionary on d, e, 
f, g, h and build dictionary on all other aggregation columns(i.e. a, b, c, 
…100+ columns). And make sure there is one segment for total data. I have open 
the speculation(quantile=0.5, interval=250, multiplier=1.2).

Time is mainly spent on first stage before shuffling. As 95% data will be 
filtered out, the shuffle process spend little time. In first stage, most task 
complete in less than 10s. But there still be near 50 tasks longer than 10s. 
Max task time for a query may be 12~16s.

Problem:
1.  GC problem. We suffer a 20%~30% GC time for some 
task in first stage after a lot of parameter refinement. We now use G1 GC in 
java8. GC time will double if use CMS. The main GC time is spent on young 
generation GC. Almost half memory of young generation will be copy to old 
generation. It seems lots of object has a long life than GC period and the 
space is not be reuse(as concurrent GC will release it later). When we use a 
large Eden(>=1G for example), once GC time will be seconds. If set Eden 
little(256M for example), once GC time will be hundreds milliseconds, but more 
frequency and total is still seconds. Is there any way to lessen the GC time? 
(We don’t consider the first query and second query in this case.)
2.  Performance refine problem. Row number after being 
filtered is not uniform. Some node maybe heavy. It spend more time than other 
node. The time of one task is 4s ~ 16s. Is any method to refine it?
3.  Too long time for first and second query. I know 
dictionary and some index need to be loaded for the first time. But after I 
trying use query below to preheat it, it still spend a lot of time. How could I 
preheat the query correctly?
select Aarray, a, b, c… from Table1 where Aarray is not 
null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
4. Any other suggestion to lessen the query time?

Some suggestion:
The log by class QueryStatisticsRecorder give me a good means to 
find the neck bottle, but not enough. There still some metric I think is very 
useful:
1. filter ratio. i.e.. not only result_size but also the origin 
size so we could know how many data is filtered.
2. IO time. The scan_blocks_time is not enough. If it is high, we 
know somethings wrong, but not know what cause that problem. The real IO time 
for data is not be provided. As there may be several file for one partition, 
know the program slow is caused by datanode or executor itself give us 
intuition to find the problem.
3. The TableBlockInfo for task. I log it by myself when debugging. 
It tell me how many blocklets is locality. The spark web monitor just give a 
locality level, but may be only one blocklet is locality.


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Aniket Adnaik
+1

Regards,
Aniket

On 9 Nov 2016 3:17 p.m., "Liang Chen"  wrote:

> Hi all,
>
> I submit the CarbonData 0.2.0-incubating to your vote.
>
> Release Notes:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220&version=12337896
>
> Staging Repository:
> https://repository.apache.org/content/repositories/
> orgapachecarbondata-1006
>
> Git Tag:
> carbondata-0.2.0-incubating
>
> Please vote to approve this release:
> [ ] +1 Approve the release
> [ ] -1 Don't approve the release (please provide specific comments)
>
> This vote will be open for at least 72 hours. If this vote passes (we need
> at least 3 binding votes, meaning three votes from the PPMC), I will
> forward to gene...@incubator.apache.org for  the IPMC votes.
>
> Here is my vote : +1 (binding)
>
> Regards
> Liang
>


[jira] [Created] (CARBONDATA-401) Look forward to support reading csv file only once in data loading

2016-11-09 Thread Lionx (JIRA)
Lionx created CARBONDATA-401:


 Summary: Look forward to support reading csv file only once in 
data loading 
 Key: CARBONDATA-401
 URL: https://issues.apache.org/jira/browse/CARBONDATA-401
 Project: CarbonData
  Issue Type: Improvement
Reporter: Lionx
Assignee: Lionx


Now, In Carbon data loading module, generating global dictionary is 
independent.  Carbon read the csv file twice for generating global dictionary 
and loading carbon data, respectively. We look forward to read the csv file 
only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Vimal Das Kammath
+1

-Vimal
On Nov 10, 2016 4:47 AM, "Liang Chen"  wrote:

> Hi all,
>
> I submit the CarbonData 0.2.0-incubating to your vote.
>
> Release Notes:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220&version=12337896
>
> Staging Repository:
> https://repository.apache.org/content/repositories/
> orgapachecarbondata-1006
>
> Git Tag:
> carbondata-0.2.0-incubating
>
> Please vote to approve this release:
> [ ] +1 Approve the release
> [ ] -1 Don't approve the release (please provide specific comments)
>
> This vote will be open for at least 72 hours. If this vote passes (we need
> at least 3 binding votes, meaning three votes from the PPMC), I will
> forward to gene...@incubator.apache.org for  the IPMC votes.
>
> Here is my vote : +1 (binding)
>
> Regards
> Liang
>


Re: RE: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Kumar Vishal
+1

-Regards
Kumar Vishal

On Nov 10, 2016 07:48, "Ravindra Pesala"  wrote:

> +1
>
> On Thu, Nov 10, 2016, 7:07 AM Jay <2550062...@qq.com> wrote:
>
> > +1
> >
> >
> > Regards
> > Jay
> >
> >
> > -- 原始邮件 --
> > 发件人: "Jihong Ma";;
> > 发送时间: 2016年11月10日(星期四) 上午7:58
> > 收件人: "dev@carbondata.incubator.apache.org"<
> > dev@carbondata.incubator.apache.org>; "chenliang...@apache.org"<
> > chenliang...@apache.org>;
> >
> > 主题: RE: [VOTE] Apache CarbonData 0.2.0-incubating release
> >
> >
> >
> > +1 binding.
> >
> > Jihong
> >
> > -Original Message-
> > From: Liang Chen [mailto:chenliang6...@gmail.com]
> > Sent: Wednesday, November 09, 2016 3:18 PM
> > To: dev@carbondata.incubator.apache.org
> > Subject: [VOTE] Apache CarbonData 0.2.0-incubating release
> >
> > Hi all,
> >
> > I submit the CarbonData 0.2.0-incubating to your vote.
> >
> > Release Notes:
> >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220&version=12337896
> >
> > Staging Repository:
> > https://repository.apache.org/content/repositories/
> orgapachecarbondata-1006
> >
> > Git Tag:
> > carbondata-0.2.0-incubating
> >
> > Please vote to approve this release:
> > [ ] +1 Approve the release
> > [ ] -1 Don't approve the release (please provide specific comments)
> >
> > This vote will be open for at least 72 hours. If this vote passes (we
> need
> > at least 3 binding votes, meaning three votes from the PPMC), I will
> > forward to gene...@incubator.apache.org for  the IPMC votes.
> >
> > Here is my vote : +1 (binding)
> >
> > Regards
> > Liang
>


Re: RE: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Ravindra Pesala
+1

On Thu, Nov 10, 2016, 7:07 AM Jay <2550062...@qq.com> wrote:

> +1
>
>
> Regards
> Jay
>
>
> -- 原始邮件 --
> 发件人: "Jihong Ma";;
> 发送时间: 2016年11月10日(星期四) 上午7:58
> 收件人: "dev@carbondata.incubator.apache.org"<
> dev@carbondata.incubator.apache.org>; "chenliang...@apache.org"<
> chenliang...@apache.org>;
>
> 主题: RE: [VOTE] Apache CarbonData 0.2.0-incubating release
>
>
>
> +1 binding.
>
> Jihong
>
> -Original Message-
> From: Liang Chen [mailto:chenliang6...@gmail.com]
> Sent: Wednesday, November 09, 2016 3:18 PM
> To: dev@carbondata.incubator.apache.org
> Subject: [VOTE] Apache CarbonData 0.2.0-incubating release
>
> Hi all,
>
> I submit the CarbonData 0.2.0-incubating to your vote.
>
> Release Notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896
>
> Staging Repository:
> https://repository.apache.org/content/repositories/orgapachecarbondata-1006
>
> Git Tag:
> carbondata-0.2.0-incubating
>
> Please vote to approve this release:
> [ ] +1 Approve the release
> [ ] -1 Don't approve the release (please provide specific comments)
>
> This vote will be open for at least 72 hours. If this vote passes (we need
> at least 3 binding votes, meaning three votes from the PPMC), I will
> forward to gene...@incubator.apache.org for  the IPMC votes.
>
> Here is my vote : +1 (binding)
>
> Regards
> Liang


??????RE: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Jay
+1


Regards
Jay


--  --
??: "Jihong Ma";;
: 2016??11??10??(??) 7:58
??: 
"dev@carbondata.incubator.apache.org"; 
"chenliang...@apache.org"; 

: RE: [VOTE] Apache CarbonData 0.2.0-incubating release



+1 binding.

Jihong

-Original Message-
From: Liang Chen [mailto:chenliang6...@gmail.com] 
Sent: Wednesday, November 09, 2016 3:18 PM
To: dev@carbondata.incubator.apache.org
Subject: [VOTE] Apache CarbonData 0.2.0-incubating release

Hi all,

I submit the CarbonData 0.2.0-incubating to your vote.

Release Notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896

Staging Repository:
https://repository.apache.org/content/repositories/orgapachecarbondata-1006

Git Tag:
carbondata-0.2.0-incubating

Please vote to approve this release:
[ ] +1 Approve the release
[ ] -1 Don't approve the release (please provide specific comments)

This vote will be open for at least 72 hours. If this vote passes (we need
at least 3 binding votes, meaning three votes from the PPMC), I will
forward to gene...@incubator.apache.org for  the IPMC votes.

Here is my vote : +1 (binding)

Regards
Liang

Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-09 Thread foryou2030
+1
regards
Gin

发自我的 iPhone

> 在 2016年11月10日,上午3:25,Kumar Vishal  写道:
> 
> +1
> -Redards
> Kumar Vishal
> 
>> On Nov 9, 2016 08:04, "Jacky Li"  wrote:
>> 
>> +1
>> 
>> Regards,
>> Jacky
>> 
>>> 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道:
>>> 
>>> +1
>>> regards
>>> Jay
>>> 
>>> 
>>> 
>>> 
>>> -- 原始邮件 --
>>> 发件人: "向志强";;
>>> 发送时间: 2016年11月9日(星期三) 上午8:59
>>> 收件人: "dev";
>>> 
>>> 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:
>>> 
>>> 
>>> 
>>> No need to install thrift for building project is so great.
>>> 
>>> 2016-11-08 23:16 GMT+08:00 QiangCai :
>>> 
 I look forward to release this version.
 Carbondata improved query and load performance. And it is a good news no
 need to install thrift for building project.
 Btw, How many PR merged into this version?
 
 
 
 --
 View this message in context: http://apache-carbondata-
 mailing-list-archive.1130556.n5.nabble.com/As-planed-we-
 are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html
 Sent from the Apache CarbonData Mailing List archive mailing list
>> archive
 at Nabble.com.
> 



RE: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Jihong Ma
+1 binding.

Jihong

-Original Message-
From: Liang Chen [mailto:chenliang6...@gmail.com] 
Sent: Wednesday, November 09, 2016 3:18 PM
To: dev@carbondata.incubator.apache.org
Subject: [VOTE] Apache CarbonData 0.2.0-incubating release

Hi all,

I submit the CarbonData 0.2.0-incubating to your vote.

Release Notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896

Staging Repository:
https://repository.apache.org/content/repositories/orgapachecarbondata-1006

Git Tag:
carbondata-0.2.0-incubating

Please vote to approve this release:
[ ] +1 Approve the release
[ ] -1 Don't approve the release (please provide specific comments)

This vote will be open for at least 72 hours. If this vote passes (we need
at least 3 binding votes, meaning three votes from the PPMC), I will
forward to gene...@incubator.apache.org for  the IPMC votes.

Here is my vote : +1 (binding)

Regards
Liang


[VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Liang Chen
Hi all,

I submit the CarbonData 0.2.0-incubating to your vote.

Release Notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12337896

Staging Repository:
https://repository.apache.org/content/repositories/orgapachecarbondata-1006

Git Tag:
carbondata-0.2.0-incubating

Please vote to approve this release:
[ ] +1 Approve the release
[ ] -1 Don't approve the release (please provide specific comments)

This vote will be open for at least 72 hours. If this vote passes (we need
at least 3 binding votes, meaning three votes from the PPMC), I will
forward to gene...@incubator.apache.org for  the IPMC votes.

Here is my vote : +1 (binding)

Regards
Liang


Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-09 Thread Kumar Vishal
+1
-Redards
Kumar Vishal

On Nov 9, 2016 08:04, "Jacky Li"  wrote:

> +1
>
> Regards,
> Jacky
>
> > 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道:
> >
> > +1
> > regards
> > Jay
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "向志强";;
> > 发送时间: 2016年11月9日(星期三) 上午8:59
> > 收件人: "dev";
> >
> > 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:
> >
> >
> >
> > No need to install thrift for building project is so great.
> >
> > 2016-11-08 23:16 GMT+08:00 QiangCai :
> >
> >> I look forward to release this version.
> >> Carbondata improved query and load performance. And it is a good news no
> >> need to install thrift for building project.
> >> Btw, How many PR merged into this version?
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-carbondata-
> >> mailing-list-archive.1130556.n5.nabble.com/As-planed-we-
> >> are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html
> >> Sent from the Apache CarbonData Mailing List archive mailing list
> archive
> >> at Nabble.com.
>
>
>
>


[DISCUSS] Improve Statistics and Profiling support

2016-11-09 Thread Venkata Gollamudi
Hi,


In Carbondata currently LOG4J level "STATISTICS" is available to log.
How ever information is incomplete to debug performance problems and it is
not easy to see statistics and profiling information of one query at one
place.
So we need to relook and improve statistics and profiling.
I have put some pointers and can discuss regarding the same.



What to collect
---
1) Statistics of table/columns

 like no of files, no of blocks,no of blocklets


2) Profiling information required to debug peformance issue and resource
utilization.

 scan statistics like row size,no of block or blocklets scanned,
distribution info, scan buffer size.

 I/O and CPU/compute cost.

 driver index effectiveness: number of blocks hit

 executor index effectiveness: number of blocklet hit

 decoding and decompression cost and memory required.

 Cache statistics , hits, misses, memory occpied.

 Dictionary statistics: no of entries, dictionary load time, memory
occupied.

 Btree statistics: no of entries, Btree load time, lookup cost, memory
occupied.

3) Data load:

 load time, memory requried, encode, compress cost.

4) Spark time and Shuffle cost.


How to collect:
---
Check if can be plugin to spark metrics/counters system.
Have decorator statistics RDD in between to get each rdd, to collect
statistics  or any method to get from spark.
make it plug-able to integrate with other processing frameworks, so that we
can get end 2 end statistics.
Some thing like log4J with clean interfaces to put and retrieve information.



Where to store:
---
In separate table
In logs
History information , like it is stored in spark(may be json). Is spark
history statistics logging separate to use across frameworks?
Collector can collect statistics and can decide where to store.



How to see:
---
Command to retrieve various statistics and profiling info
Connecting to other metrics displays like spark UI or ganglia.


Links:
--
Profiling support in impala. http://www.cloudera.com/
documentation/enterprise/5-7-x/topics/impala_explain_plan.html#perf_profile
Table and column statistics in impala. http://www.cloudera.com/
documentation/enterprise/5-8-x/topics/impala_perf_stats.
html#perf_table_stats
spark metrics collection http://spark.apache.org/docs/
latest/monitoring.html#metrics


Regards,

Venkata Ramana Gollamudi


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87218055
  
--- Diff: 
examples/src/main/scala/org/apache/carbondata/examples/CarbonExample1.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.examples.util.ExampleUtils
+
+object CarbonExample1 {
--- End diff --

This is wrongly added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87217857
  
--- Diff: conf/ss.txt ---
@@ -0,0 +1,122 @@
+
+Release Notes - CarbonData - Version 0.1.0-incubating
--- End diff --

why this is added?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception

2016-11-09 Thread MAKAMRAGHUVARDHAN (JIRA)
MAKAMRAGHUVARDHAN created CARBONDATA-400:


 Summary: [Bad Records] Load data is fail and displaying the string 
value in beeline as exception
 Key: CARBONDATA-400
 URL: https://issues.apache.org/jira/browse/CARBONDATA-400
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 0.1.0-incubating
 Environment: 3node cluster
Reporter: MAKAMRAGHUVARDHAN
Priority: Minor


Steps
1. Create table
CREATE TABLE String_test2 (string_col string) STORED BY 
'org.apache.carbondata.format';
2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a 
string value that is out of boundary.

LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table 
String_test2 OPTIONS('DELIMITER'=',' , 
'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col');

Actual Result: Load data is failed and displaying the string value in beeline 
as exception trace.
Expected Result:Should display a valid exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-399) [Bad Records] Data Load is not FAILED even bad_records_action="FAIL" .

2016-11-09 Thread Babulal (JIRA)
Babulal created CARBONDATA-399:
--

 Summary: [Bad Records] Data Load is not FAILED even  
bad_records_action="FAIL" .
 Key: CARBONDATA-399
 URL: https://issues.apache.org/jira/browse/CARBONDATA-399
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 0.1.0-incubating
 Environment: SUSE 11 SP4
YARN HA 
3 Nodes

Reporter: Babulal
Priority: Minor


Data Load is not FAILED when string data are loaded in the int column . 


1. Create table  defect_5 (imei string ,deviceInformationId int,mac 
string,productdate timestamp,updatetime timestamp,gamePointId 
double,contractNumber double) stored by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='deviceInformationId') ;


deviceInformationId  is int  ( it will handled as  dimension). Now load the 
data 


2.  0: jdbc:hive2://ha-cluster/default> LOAD DATA  inpath 
'hdfs://hacluster/tmp/100_default_date_11_header_2.csv' into table defect_5 
options('DELIMITER'=',', 'bad_records_action'='FAIL',  
'QUOTECHAR'='"','FILEHEADER'='imei,deviceinformationid,mac,productdate,updatetime,gamepointid,contractnumber');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.969 seconds)


3. Data 
imei,deviceinformationid,mac,productdate,updatetime,gamepointid,contractnumber
1AA1,babu,Mikaa1,2015-01-01 11:00:00,2015-01-01 13:00:00,10,260
1AA2,3,Mikaa2,2015-01-02 12:00:00,2015-01-01 14:00:00,278,230
1AA3,1,Mikaa1,2015-01-03 13:00:00,2015-01-01 15:00:00,2556,1
1AA4,10,Mikaa2,2015-01-04 14:00:00,2015-01-01 16:00:00,640,254
1AA5,10,Mikaa,2015-01-05 15:00:00,2015-01-01 17:00:00,980,256
1AA6,10,Mikaa,2015-01-06 16:00:00,2015-01-01 18:00:00,1,2378
1AA7,10,Mikaa,2015-01-07 17:00:00,2015-01-01 19:00:00,96,234
1AA8,9,max,2015-01-08 18:00:00,2015-01-01 20:00:00,89,236
1AA9,10,max,2015-01-09 19:00:00,2015-01-01 21:00:00,198.36,239.2



Expect Outoput:- Data Load should FAIL 
 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #308: [CARBONDATA-398] In DropCarbonTable ...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/308#discussion_r87191882
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1284,14 +1284,18 @@ private[sql] case class 
DropTableCommand(ifExistsSet: Boolean, databaseNameOp: O
   .getCarbonLockObj(carbonTableIdentifier, LockUsage.DROP_TABLE_LOCK)
 val storePath = 
CarbonEnv.getInstance(sqlContext).carbonCatalog.storePath
 var isLocked = false
+val tmpTable = 
org.apache.carbondata.core.carbon.metadata.CarbonMetadata.getInstance()
+  .getCarbonTable(dbName + '_' + tableName)
 try {
-  isLocked = carbonLock.lockWithRetries()
-  if (isLocked) {
-logInfo("Successfully able to get the lock for drop.")
-  }
-  else {
-LOGGER.audit(s"Dropping table $dbName.$tableName failed as the 
Table is locked")
-sys.error("Table is locked for deletion. Please try after some 
time")
+  if (null != tmpTable) {
--- End diff --

It is better to use 
`CarbonEnv.getInstance(sqlContext).carbonCatalog.tableExists` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-398) In DropCarbonTable flow, Metadata lock should be acquired only if table exist

2016-11-09 Thread Naresh P R (JIRA)
Naresh P R created CARBONDATA-398:
-

 Summary: In DropCarbonTable flow, Metadata lock should be acquired 
only if table exist
 Key: CARBONDATA-398
 URL: https://issues.apache.org/jira/browse/CARBONDATA-398
 Project: CarbonData
  Issue Type: Bug
Reporter: Naresh P R
Assignee: Naresh P R
Priority: Trivial


Issue : In drop table flow, we acquiring metadata lock even if table not exist 
which is creating table folder and creating meta.lock file.
Solution : Check and acquire lock only if carbon table exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #308: [WIP] In DropCarbonTable flow, Metad...

2016-11-09 Thread nareshpr
GitHub user nareshpr opened a pull request:

https://github.com/apache/incubator-carbondata/pull/308

[WIP] In DropCarbonTable flow, Metadata lock should be acquired only if 
tab…

Issue : In drop table flow, we acquiring metadata lock even if table not 
exist which is creating table folder and creating meta.lock file.
Solution : Check and acquire lock only if carbon table exist.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nareshpr/incubator-carbondata droptablefix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #308


commit 20813ccbffae1aa48bdef069721aeeebc550bc04
Author: nareshpr 
Date:   2016-11-09T12:43:36Z

In DropCarbonTable flow, Metadata lock should be acquired only if table 
exist




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #207: [CARBONDATA-283] VT enhancement for ...

2016-11-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/207


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-397) Use of ANTLR instead of CarbonSqlParser for parsing queries

2016-11-09 Thread Anurag Srivastava (JIRA)
Anurag Srivastava created CARBONDATA-397:


 Summary: Use of ANTLR instead of CarbonSqlParser for parsing 
queries
 Key: CARBONDATA-397
 URL: https://issues.apache.org/jira/browse/CARBONDATA-397
 Project: CarbonData
  Issue Type: Improvement
Reporter: Anurag Srivastava
Priority: Minor


We are using CarbonSqlParser for parsing queries but we can use ANTLR for the 
same.  we could better handle query parsing with ANTLR.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #307: [Carbondata-396] Implement test case...

2016-11-09 Thread harmeetsingh0013
GitHub user harmeetsingh0013 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/307

[Carbondata-396] Implement test cases for datastorage package

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [x] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/harmeetsingh0013/incubator-carbondata 
CARBONDATA-396

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #307


commit f007240376a11a9f2e1e172cc2bffd4b1ad4340a
Author: harmeetsingh0013 
Date:   2016-11-03T12:09:37Z

Write unit test cases for ColumnDictionaryInfo

commit 3802bbf528cb3deceef15cdd1bb4e48073a4570f
Author: harmeetsingh0013 
Date:   2016-11-03T12:38:13Z

Add apache license in javadocs

commit f76594dc13b8c4a690aa931b9c4d6982c23615c3
Author: harmeetsingh0013 
Date:   2016-11-04T05:26:51Z

Merge branch 'master' of github.com:apache/incubator-carbondata into 
CARBONDATA-371

commit 9f672e56a4a96998a556ac454811393edf216562
Author: harmeetsingh0013 
Date:   2016-11-08T05:45:13Z

Merge branch 'master' of github.com:apache/incubator-carbondata into 
CARBONDATA-371

commit 0413e2be90fbe7d413efddaacc9b814295cee14f
Author: harmeetsingh0013 
Date:   2016-11-09T09:36:34Z

Write unit test case for BlockIndexerStorageForInt UnBlockIndexerTest 
ExcludeColGroupFilterExecuterImpl classes

commit 69fe75461caf2bceda578d6b655ddf702fe943aa
Author: harmeetsingh0013 
Date:   2016-11-09T09:37:26Z

Write unit test case for BlockIndexerStorageForInt UnBlockIndexerTest 
ExcludeColGroupFilterExecuterImpl classes

commit a131b544e765c05efc97dbe641ad15391f436de2
Author: harmeetsingh0013 
Date:   2016-11-09T09:37:42Z

Merge branch 'master' of github.com:apache/incubator-carbondata into 
CARBONDATA-396




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87172561
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java
 ---
@@ -215,6 +227,105 @@ public static void executeGraph(CarbonLoadModel 
loadModel, String storeLocation,
 info, loadModel.getPartitionId(), 
loadModel.getCarbonDataLoadSchema());
   }
 
+  public static void executeNewDataLoad(CarbonLoadModel loadModel, String 
storeLocation,
+  String hdfsStoreLocation, RecordReader[] recordReaders)
+  throws Exception {
+if (!new File(storeLocation).mkdirs()) {
+  LOGGER.error("Error while creating the temp store path: " + 
storeLocation);
+}
+CarbonDataLoadConfiguration configuration = new 
CarbonDataLoadConfiguration();
+String databaseName = loadModel.getDatabaseName();
+String tableName = loadModel.getTableName();
+String tempLocationKey = databaseName + 
CarbonCommonConstants.UNDERSCORE + tableName
++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo();
+CarbonProperties.getInstance().addProperty(tempLocationKey, 
storeLocation);
+CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, 
hdfsStoreLocation);
+// CarbonProperties.getInstance().addProperty("store_output_location", 
outPutLoc);
+CarbonProperties.getInstance().addProperty("send.signal.load", 
"false");
+
+CarbonTable carbonTable = 
loadModel.getCarbonDataLoadSchema().getCarbonTable();
+AbsoluteTableIdentifier identifier =
+carbonTable.getAbsoluteTableIdentifier();
+configuration.setTableIdentifier(identifier);
+String csvHeader = loadModel.getCsvHeader();
+String csvFileName = null;
+if (csvHeader != null && !csvHeader.isEmpty()) {
+  
configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, 
","));
+} else {
+  CarbonFile csvFile =
+  
CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0));
+  csvFileName = csvFile.getName();
+  csvHeader = CarbonDataProcessorUtil.getFileHeader(csvFile);
+  configuration.setHeader(
+  CarbonDataProcessorUtil.getColumnFields(csvHeader, 
loadModel.getCsvDelimiter()));
+}
+CarbonDataProcessorUtil
+.validateHeader(loadModel.getTableName(), csvHeader, 
loadModel.getCarbonDataLoadSchema(),
+loadModel.getCsvDelimiter(), csvFileName);
+
+configuration.setPartitionId(loadModel.getPartitionId());
+configuration.setSegmentId(loadModel.getSegmentId());
+configuration.setTaskNo(loadModel.getTaskNo());
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS,
+new String[] { loadModel.getComplexDelimiterLevel1(),
+loadModel.getComplexDelimiterLevel2() });
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.SERIALIZATION_NULL_FORMAT,
+loadModel.getSerializationNullFormat().split(",")[1]);
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.FACT_TIME_STAMP,
+loadModel.getFactTimeStamp());
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ENABLE,
+loadModel.getBadRecordsLoggerEnable().split(",")[1]);
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ACTION,
+loadModel.getBadRecordsAction().split(",")[1]);
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.FACT_FILE_PATH,
+loadModel.getFactFilePath());
+List dimensions =
+
carbonTable.getDimensionByTableName(carbonTable.getFactTableName());
+List measures =
+carbonTable.getMeasureByTableName(carbonTable.getFactTableName());
+Map dateFormatMap =
+
CarbonDataProcessorUtil.getDateFormatMap(loadModel.getDateFormat());
+List dataFields = new ArrayList<>();
+List complexDataFields = new ArrayList<>();
+
+// First add dictionary and non dictionary dimensions because these 
are part of mdk key.
+// And then add complex data types and measures.
+for (CarbonColumn column : dimensions) {
+  DataField dataField = new DataField(column);
+  dataField.setDateFormat(dateFormatMap.get(column.getColName()));
+  if (column.isComplex()) {
+complexDataFields.add(dataField);
+  } else {
+dataFields.add(dataField);
+  }
+}
+dataFields.addAll(complexDataFields);
+for (CarbonColumn column : measures) {
+  // This dummy measure is added when no measure was present. We no 
need to load it.
+  if (!(column.getColNam

[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171862
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -43,35 +45,58 @@
 
   private CarbonDataLoadConfiguration configuration;
 
+  private DataField[] fields;
+
   private FieldConverter[] fieldConverters;
 
-  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+  private BadRecordslogger badRecordLogger;
+
+  private BadRecordLogHolder logHolder;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration,
+  BadRecordslogger badRecordLogger) {
+this.fields = fields;
 this.configuration = configuration;
+this.badRecordLogger = badRecordLogger;
+  }
+
+  @Override
+  public void initialize() {
 CacheProvider cacheProvider = CacheProvider.getInstance();
 Cache cache =
 cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
 configuration.getTableIdentifier().getStorePath());
+String nullFormat =
+
configuration.getDataLoadProperty(DataLoadProcessorConstants.SERIALIZATION_NULL_FORMAT)
+.toString();
 List fieldConverterList = new ArrayList<>();
 
 long lruCacheStartTime = System.currentTimeMillis();
 
 for (int i = 0; i < fields.length; i++) {
   FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
   .createFieldEncoder(fields[i], cache,
-  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
-  if (fieldConverter != null) {
-fieldConverterList.add(fieldConverter);
-  }
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i, nullFormat);
+  fieldConverterList.add(fieldConverter);
 }
 CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
 .recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
 fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
+logHolder = new BadRecordLogHolder();
   }
 
   @Override
   public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
+CarbonRow copy = row.getCopy();
--- End diff --

It is required as the fieldConverters update the row and if badrecord found 
in last column then we don't want to give converted data logger.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171880
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java
 ---
@@ -102,21 +102,14 @@ public void initialize(SortParameters sortParameters) 
{
 }
 this.executorService = Executors.newFixedThreadPool(iterators.length);
 
-// First prepare the data for sort.
-Iterator[] sortPrepIterators = new 
Iterator[iterators.length];
-for (int i = 0; i < sortPrepIterators.length; i++) {
-  sortPrepIterators[i] = new SortPreparatorIterator(iterators[i], 
inputDataFields);
-}
-
-for (int i = 0; i < sortDataRows.length; i++) {
-  executorService
-  .submit(new SortIteratorThread(sortPrepIterators[i], 
sortDataRows[i], sortParameters));
-}
-
 try {
+  for (int i = 0; i < sortDataRows.length; i++) {
+executorService
+.submit(new SortIteratorThread(iterators[i], sortDataRows[i], 
sortParameters));
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171144
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java
 ---
@@ -110,7 +116,11 @@ public IntermediateFileMerger(SortParameters 
mergerParameters, File[] intermedia
   initialize();
 
   while (hasNext()) {
-writeDataTofile(next());
+if (useKettle) {
--- End diff --

Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171167
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/SortProcessorStepImpl.java
 ---
@@ -50,6 +50,7 @@ public SortProcessorStepImpl(CarbonDataLoadConfiguration 
configuration,
 
   @Override
   public void initialize() throws CarbonDataLoadingException {
+super.initialize();
--- End diff --

Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87171187
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/DataConverterProcessorStepImpl.java
 ---
@@ -47,20 +58,109 @@ public 
DataConverterProcessorStepImpl(CarbonDataLoadConfiguration configuration,
 
   @Override
   public void initialize() throws CarbonDataLoadingException {
-encoder = new RowConverterImpl(child.getOutput(), configuration);
-child.initialize();
+super.initialize();
+BadRecordslogger badRecordLogger = createBadRecordLogger();
+converter = new RowConverterImpl(child.getOutput(), configuration, 
badRecordLogger);
+converter.initialize();
+  }
+
+  /**
+   * Create the iterator using child iterator.
+   *
+   * @param childIter
+   * @return new iterator with step specific processing.
+   */
+  @Override
+  protected Iterator getIterator(final 
Iterator childIter) {
+return new CarbonIterator() {
+  RowConverter localConverter = converter.createCopyForNewThread();
+  @Override public boolean hasNext() {
+return childIter.hasNext();
+  }
+
+  @Override public CarbonRowBatch next() {
+return processRowBatch(childIter.next(), localConverter);
+  }
+};
+  }
+
+  /**
+   * Process the batch of rows as per the step logic.
+   *
+   * @param rowBatch
+   * @return processed row.
+   */
+  protected CarbonRowBatch processRowBatch(CarbonRowBatch rowBatch, 
RowConverter localConverter) {
+CarbonRowBatch newBatch = new CarbonRowBatch();
+Iterator batchIterator = rowBatch.getBatchIterator();
+while (batchIterator.hasNext()) {
+  newBatch.addRow(localConverter.convert(batchIterator.next()));
+}
+return newBatch;
   }
 
   @Override
   protected CarbonRow processRow(CarbonRow row) {
-return encoder.convert(row);
+// Not implemented
+return null;
+  }
+
+  private BadRecordslogger createBadRecordLogger() {
+boolean badRecordsLogRedirect = false;
+boolean badRecordConvertNullDisable = false;
+boolean badRecordsLoggerEnable = Boolean.parseBoolean(
+
configuration.getDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ENABLE)
+.toString());
+Object bad_records_action =
+
configuration.getDataLoadProperty(DataLoadProcessorConstants.BAD_RECORDS_LOGGER_ACTION)
+.toString();
+if (null != bad_records_action) {
+  LoggerAction loggerAction = null;
+  try {
+loggerAction = 
LoggerAction.valueOf(bad_records_action.toString().toUpperCase());
+  } catch (IllegalArgumentException e) {
+loggerAction = LoggerAction.FORCE;
+  }
+  switch (loggerAction) {
+case FORCE:
+  badRecordConvertNullDisable = false;
+  break;
+case REDIRECT:
+  badRecordsLogRedirect = true;
+  badRecordConvertNullDisable = true;
+  break;
+case IGNORE:
+  badRecordsLogRedirect = false;
+  badRecordConvertNullDisable = true;
+  break;
+  }
+}
+CarbonTableIdentifier identifier =
+configuration.getTableIdentifier().getCarbonTableIdentifier();
+String key = identifier.getDatabaseName() + '/' + 
identifier.getTableName() + '_' + identifier
+.getTableName();
+BadRecordslogger badRecordslogger =
+new BadRecordslogger(key, identifier.getTableName() + '_' + 
System.currentTimeMillis(),
+getBadLogStoreLocation(
+identifier.getDatabaseName() + '/' + 
identifier.getTableName() + "/" + configuration
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170923
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortParameters.java
 ---
@@ -122,6 +116,11 @@
 
   private int numberOfCores;
 
+  /**
+   * Temporary conf , it will be removed after refactor.
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170873
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java
 ---
@@ -81,13 +81,24 @@
*/
   private String taskKey;
 
+  private boolean badRecordsLogRedirect;
+
+  private boolean badRecordLoggerEnable;
+
+  private boolean badRecordConvertNullDisable;
+
   // private final Object syncObject =new Object();
 
-  public BadRecordslogger(String key, String fileName, String storePath) {
+  public BadRecordslogger(String key, String fileName, String storePath,
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170942
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java
 ---
@@ -264,6 +277,72 @@ private void writeData(Object[][] recordHolderList, 
int entryCountLocal, File fi
 }
   }
 
+  private void writeDataWithOutKettle(Object[][] recordHolderList, int 
entryCountLocal, File file)
+  throws CarbonSortKeyAndGroupByException {
+DataOutputStream stream = null;
+try {
+  // open stream
+  stream = new DataOutputStream(new BufferedOutputStream(new 
FileOutputStream(file),
+  parameters.getFileWriteBufferSize()));
+
+  // write number of entries to the file
+  stream.writeInt(entryCountLocal);
+  int complexDimColCount = parameters.getComplexDimColCount();
+  int dimColCount = parameters.getDimColCount() + complexDimColCount;
+  char[] aggType = parameters.getAggType();
+  boolean[] noDictionaryDimnesionMapping = 
parameters.getNoDictionaryDimnesionColumn();
+  Object[] row = null;
+  for (int i = 0; i < entryCountLocal; i++) {
+// get row from record holder list
+row = recordHolderList[i];
+int dimCount = 0;
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170892
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortTempFileChunkHolder.java
 ---
@@ -136,6 +136,9 @@
*/
   private boolean[] isNoDictionaryDimensionColumn;
 
+  // temporary configuration
+  private boolean useKettle;
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170828
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -396,4 +407,223 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 }
 return complexTypesMap;
   }
+
+  /**
+   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
+   *
+   * @param csvFilePath
+   * @return File
+   */
+  public static CarbonFile getCsvFileToRead(String csvFilePath) {
+CarbonFile csvFile =
+FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
+
+CarbonFile[] listFiles = null;
+if (csvFile.isDirectory()) {
+  listFiles = csvFile.listFiles(new CarbonFileFilter() {
+@Override public boolean accept(CarbonFile pathname) {
+  if (!pathname.isDirectory()) {
+if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
+
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
++ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
+  return true;
+}
+  }
+
+  return false;
+}
+  });
+} else {
+  listFiles = new CarbonFile[1];
+  listFiles[0] = csvFile;
+
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170805
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -396,4 +407,223 @@ private static void 
addAllComplexTypeChildren(CarbonDimension dimension, StringB
 }
 return complexTypesMap;
   }
+
+  /**
+   * Get the csv file to read if it the path is file otherwise get the 
first file of directory.
+   *
+   * @param csvFilePath
+   * @return File
+   */
+  public static CarbonFile getCsvFileToRead(String csvFilePath) {
+CarbonFile csvFile =
+FileFactory.getCarbonFile(csvFilePath, 
FileFactory.getFileType(csvFilePath));
+
+CarbonFile[] listFiles = null;
+if (csvFile.isDirectory()) {
+  listFiles = csvFile.listFiles(new CarbonFileFilter() {
+@Override public boolean accept(CarbonFile pathname) {
+  if (!pathname.isDirectory()) {
+if 
(pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || 
pathname
+
.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION
++ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) {
+  return true;
+}
+  }
+
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170789
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java
 ---
@@ -215,6 +227,105 @@ public static void executeGraph(CarbonLoadModel 
loadModel, String storeLocation,
 info, loadModel.getPartitionId(), 
loadModel.getCarbonDataLoadSchema());
   }
 
+  public static void executeNewDataLoad(CarbonLoadModel loadModel, String 
storeLocation,
+  String hdfsStoreLocation, RecordReader[] recordReaders)
+  throws Exception {
+if (!new File(storeLocation).mkdirs()) {
+  LOGGER.error("Error while creating the temp store path: " + 
storeLocation);
+}
+CarbonDataLoadConfiguration configuration = new 
CarbonDataLoadConfiguration();
+String databaseName = loadModel.getDatabaseName();
+String tableName = loadModel.getTableName();
+String tempLocationKey = databaseName + 
CarbonCommonConstants.UNDERSCORE + tableName
++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo();
+CarbonProperties.getInstance().addProperty(tempLocationKey, 
storeLocation);
+CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, 
hdfsStoreLocation);
+// CarbonProperties.getInstance().addProperty("store_output_location", 
outPutLoc);
+CarbonProperties.getInstance().addProperty("send.signal.load", 
"false");
+
+CarbonTable carbonTable = 
loadModel.getCarbonDataLoadSchema().getCarbonTable();
+AbsoluteTableIdentifier identifier =
+carbonTable.getAbsoluteTableIdentifier();
+configuration.setTableIdentifier(identifier);
+String csvHeader = loadModel.getCsvHeader();
+String csvFileName = null;
+if (csvHeader != null && !csvHeader.isEmpty()) {
+  
configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, 
","));
+} else {
+  CarbonFile csvFile =
+  
CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0));
+  csvFileName = csvFile.getName();
+  csvHeader = CarbonDataProcessorUtil.getFileHeader(csvFile);
+  configuration.setHeader(
+  CarbonDataProcessorUtil.getColumnFields(csvHeader, 
loadModel.getCsvDelimiter()));
+}
+CarbonDataProcessorUtil
+.validateHeader(loadModel.getTableName(), csvHeader, 
loadModel.getCarbonDataLoadSchema(),
+loadModel.getCsvDelimiter(), csvFileName);
--- End diff --

Ok. Modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170848
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -952,10 +951,9 @@ private String getCarbonLocalBaseStoreLocation() {
 // In that case it will have first value empty and other values will 
be null
 // So If records is coming like this then we need to write this 
records as a bad Record.
 
-if (null == r[0] && badRecordConvertNullDisable) {
+if (null == r[0] && badRecordslogger.isBadRecordConvertNullDisable()) {
   badRecordslogger
-  .addBadRecordsToBuilder(r, "Column Names are coming NULL", 
"null",
-  badRecordsLogRedirect, badRecordsLoggerEnable);
+  .addBadRecordsToBuilder(r, "Column Names are coming NULL", 
"null");
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-09 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r87170688
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java
 ---
@@ -215,6 +227,105 @@ public static void executeGraph(CarbonLoadModel 
loadModel, String storeLocation,
 info, loadModel.getPartitionId(), 
loadModel.getCarbonDataLoadSchema());
   }
 
+  public static void executeNewDataLoad(CarbonLoadModel loadModel, String 
storeLocation,
+  String hdfsStoreLocation, RecordReader[] recordReaders)
+  throws Exception {
+if (!new File(storeLocation).mkdirs()) {
+  LOGGER.error("Error while creating the temp store path: " + 
storeLocation);
+}
+CarbonDataLoadConfiguration configuration = new 
CarbonDataLoadConfiguration();
+String databaseName = loadModel.getDatabaseName();
+String tableName = loadModel.getTableName();
+String tempLocationKey = databaseName + 
CarbonCommonConstants.UNDERSCORE + tableName
++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo();
+CarbonProperties.getInstance().addProperty(tempLocationKey, 
storeLocation);
+CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, 
hdfsStoreLocation);
+// CarbonProperties.getInstance().addProperty("store_output_location", 
outPutLoc);
+CarbonProperties.getInstance().addProperty("send.signal.load", 
"false");
+
+CarbonTable carbonTable = 
loadModel.getCarbonDataLoadSchema().getCarbonTable();
+AbsoluteTableIdentifier identifier =
+carbonTable.getAbsoluteTableIdentifier();
+configuration.setTableIdentifier(identifier);
+String csvHeader = loadModel.getCsvHeader();
+String csvFileName = null;
+if (csvHeader != null && !csvHeader.isEmpty()) {
+  
configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, 
","));
+} else {
+  CarbonFile csvFile =
+  
CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0));
--- End diff --

csvfile can exists inside hdfs also, thats why we are creating CarbonFile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #298: [CARBONDATA-383]Optimize hdfsStoreLo...

2016-11-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/298


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-396) Implement test cases for datastorage package

2016-11-09 Thread Anurag Srivastava (JIRA)
Anurag Srivastava created CARBONDATA-396:


 Summary: Implement test cases for datastorage package
 Key: CARBONDATA-396
 URL: https://issues.apache.org/jira/browse/CARBONDATA-396
 Project: CarbonData
  Issue Type: Test
Reporter: Anurag Srivastava
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)