[GitHub] incubator-carbondata pull request #313: [CARBONDATA-405]Fixed Data load fail...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/313#discussion_r8753
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataframe/DataFrameTestCase.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataframe
+
+import java.io.File
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode}
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.{CarbonHiveContext, QueryTest}
+import org.scalatest.BeforeAndAfterAll
+
+/**
+ * Test Class for hadoop fs relation
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: GC problem and performance refine problem

2016-11-10 Thread An Lan
Hi Kumar Vishal,

1.   Create table ddl:

CREATE TABLE IF NOT EXISTS Table1

(* h Int, g Int, d String, f Int, e Int,*

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format'
TBLPROPERTIES(

"NO_INVERTED_INDEX”=“a”,
"NO_INVERTED_INDEX”=“b”,

…(extra near 300 columns)

"DICTIONARY_INCLUDE”=“a”,

"DICTIONARY_INCLUDE”=“b”,

…(extra near 300 columns)

)

2.   3. There more than hundreds node in the cluster, but cluster is
used mixed with other application.  Some time when node is enough, we will
get 100 distinct node.

4.  I give a statistic of task time during once query and mark distinct
nodes below:

[image: 内嵌图片 1]




2016-11-10 23:52 GMT+08:00 Kumar Vishal :

> Hi Anning Luo,
>
> Can u please provide below details.
>
> 1.Create table ddl.
> 2.Number of node in you cluster setup.
> 3. Number of executors per node.
> 4. Query statistics.
>
> Please find my comments in bold.
>
> Problem:
> 1.  GC problem. We suffer a 20%~30% GC time for
> some task in first stage after a lot of parameter refinement. We now use G1
> GC in java8. GC time will double if use CMS. The main GC time is spent on
> young generation GC. Almost half memory of young generation will be copy to
> old generation. It seems lots of object has a long life than GC period and
> the space is not be reuse(as concurrent GC will release it later). When we
> use a large Eden(>=1G for example), once GC time will be seconds. If set
> Eden little(256M for example), once GC time will be hundreds milliseconds,
> but more frequency and total is still seconds. Is there any way to lessen
> the GC time? (We don’t consider the first query and second query in this
> case.)
>
> *How many node are present in your cluster setup?? If nodes are less please
> reduce the number of executors per node.*
>
> 2.  Performance refine problem. Row number after
> being filtered is not uniform. Some node maybe heavy. It spend more time
> than other node. The time of one task is 4s ~ 16s. Is any method to refine
> it?
>
> 3.  Too long time for first and second query. I
> know dictionary and some index need to be loaded for the first time. But
> after I trying use query below to preheat it, it still spend a lot of time.
> How could I preheat the query correctly?
> select Aarray, a, b, c… from Table1 where Aarray is
> not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55
>
> *Currently we are working on first time query improvement. For now you can
> run select count(*) or count(column), so all the blocks get loaded and then
> you can run the actual query.*
>
> 4. Any other suggestion to lessen the query time?
>
>
> Some suggestion:
> The log by class QueryStatisticsRecorder give me a good means
> to find the neck bottle, but not enough. There still some metric I think is
> very useful:
> 1. filter ratio. i.e.. not only result_size but also the origin
> size so we could know how many data is filtered.
> 2. IO time. The scan_blocks_time is not enough. If it is high,
> we know somethings wrong, but not know what cause that problem. The real IO
> time for data is not be provided. As there may be several file for one
> partition, know the program slow is caused by datanode or executor itself
> give us intuition to find the problem.
> 3. The TableBlockInfo for task. I log it by myself when
> debugging. It tell me how many blocklets is locality. The spark web monitor
> just give a locality level, but may be only one blocklet is locality.
>
>
> -Regards
> Kumar Vishal
>
> On Thu, Nov 10, 2016 at 8:55 PM, An Lan  wrote:
>
> > Hi,
> >
> > We are using carbondata to build our table and running query in
> > CarbonContext. We have some performance problem during refining the
> system.
> >
> > *Background*:
> >
> > *cluster*:  100 executor,5 task/executor, 10G
> > memory/executor
> >
> > *data*:  60+GB(per one replica) as carbon
> data
> > format, 600+MB/file * 100 file, 300+columns, 300+million rows
> >
> > *sql example:*
> >
> >   select A,
> >
> >   sum(a),
> >
> >   sum(b),
> >
> >   sum(c),
> >
> >   …( extra 100 aggregation like
> > sum(column))
> >
> >   from Table1 LATERAL VIEW
> > explode(split(Aarray, ‘*;*’)) ATable AS A
> >
> >   where A is not null and d >
> “ab:c-10”
> > and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
> >
> > *target query time*:   <10s
> >
> > *current query time*: 15s ~ 25s
> >
> > *scene:* OLAP system. <100 queries every day.
> > Concurrency number is 

Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread Jean-Baptiste Onofré

+1 (binding)

Regards
JB

On 11/10/2016 12:17 AM, Liang Chen wrote:

Hi all,

I submit the CarbonData 0.2.0-incubating to your vote.

Release Notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12337896

Staging Repository:
https://repository.apache.org/content/repositories/orgapachecarbondata-1006

Git Tag:
carbondata-0.2.0-incubating

Please vote to approve this release:
[ ] +1 Approve the release
[ ] -1 Don't approve the release (please provide specific comments)

This vote will be open for at least 72 hours. If this vote passes (we need
at least 3 binding votes, meaning three votes from the PPMC), I will
forward to gene...@incubator.apache.org for  the IPMC votes.

Here is my vote : +1 (binding)

Regards
Liang



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


[GitHub] incubator-carbondata pull request #305: [CARBONDATA-393] implement test case...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/305#discussion_r87540775
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/keygenerator/mdkey/NumberCompressorUnitTest.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.keygenerator.mdkey;
+
+import org.junit.Test;
+
+import static junit.framework.Assert.assertEquals;
+
+
+public class NumberCompressorUnitTest {
+
+private NumberCompressor numberCompressor;
+
+
+@Test
+public void testCompress() throws Exception {
+int cardinality = 10;
+numberCompressor = new NumberCompressor(cardinality);
+byte[] expected = new byte[]{2, 86, 115};
+int[] keys = new int[]{2, 5, 6, 7, 3};
+byte[] result = numberCompressor.compress(keys);
+for (int i = 0; i < result.length; i++) {
+assertEquals(expected[i], result[i]);
+}
+}
+
+@Test
--- End diff --

Test with boundary and negative conditions 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87540836
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -177,6 +178,13 @@ public ValidAndInvalidSegmentsInfo 
getValidAndInvalidSegments() throws IOExcepti
   }
 
 }
+
+// remove entry in the segment index if there are invalid segments
+if (listOfInvalidSegments.size() > 0) {
--- End diff --

ok, modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87540816
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -177,6 +178,13 @@ public ValidAndInvalidSegmentsInfo 
getValidAndInvalidSegments() throws IOExcepti
   }
 
 }
+
+// remove entry in the segment index if there are invalid segments
+if (listOfInvalidSegments.size() > 0) {
--- End diff --

ok, modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #305: [CARBONDATA-393] implement test case...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/305#discussion_r87540550
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/keygenerator/mdkey/BitsUnitTest.java
 ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.keygenerator.mdkey;
+
+
+import org.junit.Test;
+import static org.hamcrest.CoreMatchers.equalTo;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.core.Is.is;
+
+public class BitsUnitTest {
+private Bits bits;
+
+@Test
+public void testGetKeyByteOffsets() throws Exception {
+int[] lens = new int[]{1, 2, 3};
+bits = new Bits(lens);
+int index = 2;
+int[] expected = new int[]{0, 0};
+int[] result = bits.getKeyByteOffsets(index);
+assertThat(result, is(equalTo(expected)));
+}
+
+@Test
+public void testGetWithIntKeys() throws Exception {
+int[] lens = new int[]{20, 35, 10};
+bits = new Bits(lens);
+long[] expected = new long[]{703687441812490L, 0};
--- End diff --

test with negative and boundary conditions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #305: [CARBONDATA-393] implement test case...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/305#discussion_r87540505
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/keygenerator/mdkey/BitsUnitTest.java
 ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.keygenerator.mdkey;
+
+
+import org.junit.Test;
+import static org.hamcrest.CoreMatchers.equalTo;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.core.Is.is;
+
+public class BitsUnitTest {
+private Bits bits;
+
+@Test
+public void testGetKeyByteOffsets() throws Exception {
+int[] lens = new int[]{1, 2, 3};
--- End diff --

Add more testcases with big values and also cover the boundary conditions 
in test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #305: [CARBONDATA-393] implement test case...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/305#discussion_r87540197
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/keygenerator/columnar/impl/MultiDimKeyVarLengthVariableSplitGeneratorUnitTest.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.keygenerator.columnar.impl;
+
+
+import org.junit.Before;
+import org.junit.Test;
+import static junit.framework.Assert.assertEquals;
+import java.util.Arrays;
+
+public class MultiDimKeyVarLengthVariableSplitGeneratorUnitTest {
+
+private MultiDimKeyVarLengthVariableSplitGenerator 
multiDimKeyVarLengthVariableSplitGenerator;
+
+@Before
+public void setup() {
+int[] lens = new int[]{1, 2, 3, 4, 5, 7, 8, 9, 0, 9, 8, 7, 6, 5, 
4, 3};
+int[] dimSplit = new int[]{50, 30};
--- End diff --

Here we should give proper `dimSplit`  and add more testcases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #303: [CARBONDATA-386] Unit test case for ...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/303#discussion_r87531522
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java 
---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import mockit.Mock;
+import mockit.MockUp;
+import 
org.apache.carbondata.core.carbon.metadata.blocklet.index.BlockletBTreeIndex;
+import 
org.apache.carbondata.core.carbon.metadata.blocklet.index.BlockletIndex;
+import 
org.apache.carbondata.core.carbon.metadata.blocklet.index.BlockletMinMaxIndex;
+import org.apache.carbondata.core.carbon.metadata.index.BlockIndexInfo;
+import org.apache.carbondata.core.metadata.BlockletInfoColumnar;
+import org.apache.carbondata.format.BlockIndex;
+import org.apache.carbondata.format.ColumnSchema;
+import org.apache.carbondata.format.IndexHeader;
+import org.apache.carbondata.format.SegmentInfo;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import static junit.framework.TestCase.*;
+import static 
org.apache.carbondata.core.util.CarbonMetadataUtil.getBlockIndexInfo;
+import static 
org.apache.carbondata.core.util.CarbonMetadataUtil.getIndexHeader;
+
+public class CarbonMetadataUtilTest {
+
--- End diff --

There are many methods to cover in CarbonMetadataUtil


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #295: [Carbondata-379] Scan package's unit...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/295#discussion_r87529444
  
--- Diff: 
core/src/test/java/org/apache/carbondata/scan/result/impl/NonFilterQueryScannedResultTest.java
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.scan.result.impl;
+
+import mockit.Mock;
+import mockit.MockUp;
+import org.apache.carbondata.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.carbondata.scan.result.AbstractScannedResult;
+import org.junit.Before;
+import org.junit.Test;
+
+public class NonFilterQueryScannedResultTest {
+private static NonFilterQueryScannedResult nonFilterQueryScannedResult;
+
+@Before
+public  void setUp(){
+BlockExecutionInfo blockExecutionInfo = new BlockExecutionInfo();
+QueryDimension queryDimension[] = {new 
QueryDimension("dummyColumnName1"),new QueryDimension("dummyColumnName2")};
+blockExecutionInfo.setQueryDimensions(queryDimension);
+nonFilterQueryScannedResult = new 
NonFilterQueryScannedResult(blockExecutionInfo);
+
+}
+
+@Test
+public void testIsNullMeasureValue(){
--- End diff --

this test is doing nothing, please mock it properly. We supposed to set 
`measureDataChunks` and call this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-10 Thread bill.zhou
+1 
Regards 
bill.zhou 

Liang Chen wrote
> Hi all
> 
> In 0.2.0 version of CarbonData, there are major performance improvements
> like blocklets distribution, support BZIP2 compressed files, and so on
> added to enhance the CarbonData performance significantly. Along with
> performance improvement, there are new features added to enhance
> compatibility and usability of CarbonData like remove thrift compiler
> dependency.
> 
> 
> I can be this release manager, can JB guide me to finish this release?
> 
> Thanks.
> 
> 
> Regards
> Liang





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/As-planed-we-are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2861.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread jarray
+1 binding


On 11/11/2016 02:37, Venkata Gollamudi wrote:
+1

Regards,
Ramana

On Thu, Nov 10, 2016, 10:03 PM Jacky Li  wrote:

> +1 binding
>
> Regards,
> Jacky
>
> ---Original---
> From: "Aniket Adnaik"
> Date: 2016/11/10 14:43:49
> To: "dev";"chenliang613"<
> chenliang...@apache.org>;
> Subject: Re: [VOTE] Apache CarbonData 0.2.0-incubating release
>
>
> +1
>
> Regards,
> Aniket
>
> On 9 Nov 2016 3:17 p.m., "Liang Chen"  wrote:
>
> > Hi all,
> >
> > I submit the CarbonData 0.2.0-incubating to your vote.
> >
> > Release Notes:
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > projectId=12320220=12337896
> >
> > Staging Repository:
> > https://repository.apache.org/content/repositories/
> > orgapachecarbondata-1006
> >
> > Git Tag:
> > carbondata-0.2.0-incubating
> >
> > Please vote to approve this release:
> > [ ] +1 Approve the release
> > [ ] -1 Don't approve the release (please provide specific comments)
> >
> > This vote will be open for at least 72 hours. If this vote passes (we
> need
> > at least 3 binding votes, meaning three votes from the PPMC), I will
> > forward to gene...@incubator.apache.org for  the IPMC votes.
> >
> > Here is my vote : +1 (binding)
> >
> > Regards
> > Liang
> >


[GitHub] incubator-carbondata pull request #295: [Carbondata-379] Scan package's unit...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/295#discussion_r87527367
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/complextypes/PrimitiveQueryType.java
 ---
@@ -166,6 +166,7 @@ public PrimitiveQueryType(String name, String 
parentname, int blockIndex,
   DirectDictionaryGenerator directDictionaryGenerator = 
DirectDictionaryKeyGeneratorFactory
   .getDirectDictionaryGenerator(dataType);
   actualData = 
directDictionaryGenerator.getValueFromSurrogate(surrgateValue);
+
--- End diff --

Please don't add space



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #277: [CARBONDATA-357] Add unit test for V...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/277#discussion_r87527053
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/util/ValueCompressionUtilTest.java
 ---
@@ -0,0 +1,546 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import 
org.apache.carbondata.core.datastorage.store.compression.ValueCompressonHolder;
+import org.apache.carbondata.core.datastorage.store.compression.type.*;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static junit.framework.TestCase.*;
+import static 
org.apache.carbondata.core.util.ValueCompressionUtil.DataType;
+
+public class ValueCompressionUtilTest {
+
+@Test
+public void testGetSize() {
+DataType[] dataTypes = 
{DataType.DATA_BIGINT,DataType.DATA_INT,DataType.DATA_BYTE,DataType.DATA_SHORT,DataType.DATA_FLOAT};
+int[] expectedSizes = {8,4,1,2,4};
+for(int i =0; i < dataTypes.length; i++) {
+
assertEquals(expectedSizes[i],ValueCompressionUtil.getSize(dataTypes[i]));
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataInt() {
+double[] values = {20.121,21.223,22.345};
+int[] result = (int[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_INT,22.3,3);
+int[] expectedResult = {2,1,0};
+for(int i=0; i < values.length; i++) {
+assertEquals(result[i], expectedResult[i]);
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataByte() {
+double[] values = {20.121,21.223,22.345};
+byte[] result = (byte[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_BYTE,22.345,3);
+byte[] expectedResult = {2,1,0};
+for(int i=0; i < values.length; i++) {
+assertEquals(result[i], expectedResult[i]);
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataShort() {
+double[] values = {200.121,21.223,22.345};
+short[] result = (short[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_SHORT,22.345,3);
+short[] expectedResult = {-177,1,0};
+for(int i=0; i < values.length; i++) {
+assertEquals(result[i], expectedResult[i]);
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataLong() {
+double[] values = {20.121,21.223,22.345};
+long[] result = (long[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_LONG,22.345,3);
+long[] expectedResult = {2,1,0};
+for(int i=0; i < values.length; i++) {
+assertEquals(result[i], expectedResult[i]);
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataFloat() {
+double[] values = {20.121,21.223,22.345};
+float[] result = (float[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_FLOAT,22.345,3);
+float[] expectedResult = {2.224f,1.122f,0f};
+for(int i=0; i < values.length; i++) {
+assertEquals(result[i], expectedResult[i]);
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataDouble() {
+double[] values = {20.121,21.223,22.345};
+double[] result = (double[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_DOUBLE,102.345,3);
+

[GitHub] incubator-carbondata pull request #277: [CARBONDATA-357] Add unit test for V...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/277#discussion_r87526104
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/util/ValueCompressionUtilTest.java
 ---
@@ -0,0 +1,546 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import 
org.apache.carbondata.core.datastorage.store.compression.ValueCompressonHolder;
+import org.apache.carbondata.core.datastorage.store.compression.type.*;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static junit.framework.TestCase.*;
+import static 
org.apache.carbondata.core.util.ValueCompressionUtil.DataType;
+
+public class ValueCompressionUtilTest {
+
+@Test
+public void testGetSize() {
+DataType[] dataTypes = 
{DataType.DATA_BIGINT,DataType.DATA_INT,DataType.DATA_BYTE,DataType.DATA_SHORT,DataType.DATA_FLOAT};
+int[] expectedSizes = {8,4,1,2,4};
+for(int i =0; i < dataTypes.length; i++) {
+
assertEquals(expectedSizes[i],ValueCompressionUtil.getSize(dataTypes[i]));
+}
+}
+
+@Test
+public void 
testToGetCompressedValuesWithCompressionTypeMin_MaxForDataInt() {
+double[] values = {20.121,21.223,22.345};
+int[] result = (int[]) 
ValueCompressionUtil.getCompressedValues(ValueCompressionUtil.COMPRESSION_TYPE.MAX_MIN,values,DataType.DATA_INT,22.3,3);
--- End diff --

The values which are passed are wrong, you are passing decimal values and 
type passed as `MAX_MIN`, it does not return right result. Please pass the 
proper values depends up on the compression type.
Please change for other datatypes as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #313: [CARBONDATA-405]Fixed Data load fail...

2016-11-10 Thread Jay357089
Github user Jay357089 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/313#discussion_r87524058
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataframe/DataFrameTestCase.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataframe
+
+import java.io.File
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode}
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.{CarbonHiveContext, QueryTest}
+import org.scalatest.BeforeAndAfterAll
+
+/**
+ * Test Class for hadoop fs relation
--- End diff --

the comment is not proper...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread 金铸

+1 bingding


jinzhu


在 2016/11/11 0:33, Jacky Li 写道:

+1 binding

Regards,
Jacky

---Original---
From: "Aniket Adnaik"
Date: 2016/11/10 14:43:49
To: 
"dev";"chenliang613";
Subject: Re: [VOTE] Apache CarbonData 0.2.0-incubating release


+1

Regards,
Aniket

On 9 Nov 2016 3:17 p.m., "Liang Chen"  wrote:


Hi all,

I submit the CarbonData 0.2.0-incubating to your vote.

Release Notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?
projectId=12320220=12337896

Staging Repository:
https://repository.apache.org/content/repositories/
orgapachecarbondata-1006

Git Tag:
carbondata-0.2.0-incubating

Please vote to approve this release:
[ ] +1 Approve the release
[ ] -1 Don't approve the release (please provide specific comments)

This vote will be open for at least 72 hours. If this vote passes (we need
at least 3 binding votes, meaning three votes from the PPMC), I will
forward to gene...@incubator.apache.org for  the IPMC votes.

Here is my vote : +1 (binding)

Regards
Liang

>


 




---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---


Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-10 Thread Venkata Gollamudi
+1

Regards,
Ramana

On Thu, Nov 10, 2016, 6:03 AM foryou2030  wrote:

> +1
> regards
> Gin
>
> 发自我的 iPhone
>
> > 在 2016年11月10日,上午3:25,Kumar Vishal  写道:
> >
> > +1
> > -Redards
> > Kumar Vishal
> >
> >> On Nov 9, 2016 08:04, "Jacky Li"  wrote:
> >>
> >> +1
> >>
> >> Regards,
> >> Jacky
> >>
> >>> 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道:
> >>>
> >>> +1
> >>> regards
> >>> Jay
> >>>
> >>>
> >>>
> >>>
> >>> -- 原始邮件 --
> >>> 发件人: "向志强";;
> >>> 发送时间: 2016年11月9日(星期三) 上午8:59
> >>> 收件人: "dev";
> >>>
> >>> 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0
> release:
> >>>
> >>>
> >>>
> >>> No need to install thrift for building project is so great.
> >>>
> >>> 2016-11-08 23:16 GMT+08:00 QiangCai :
> >>>
>  I look forward to release this version.
>  Carbondata improved query and load performance. And it is a good news
> no
>  need to install thrift for building project.
>  Btw, How many PR merged into this version?
> 
> 
> 
>  --
>  View this message in context: http://apache-carbondata-
>  mailing-list-archive.1130556.n5.nabble.com/As-planed-we-
>  are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html
>  Sent from the Apache CarbonData Mailing List archive mailing list
> >> archive
>  at Nabble.com.
> >
>
>


[GitHub] incubator-carbondata pull request #270: [CARBONDATA-346] Add unit test for C...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/270#discussion_r87453971
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/util/CarbonUtilTest.java ---
@@ -18,18 +18,746 @@
  */
 package org.apache.carbondata.core.util;
 
-import junit.framework.TestCase;
+import mockit.Mock;
+import mockit.MockUp;
+import 
org.apache.carbondata.core.carbon.datastore.chunk.DimensionChunkAttributes;
+import 
org.apache.carbondata.core.carbon.datastore.chunk.impl.FixedLengthDimensionDataChunk;
+import org.apache.carbondata.core.carbon.metadata.blocklet.DataFileFooter;
+import 
org.apache.carbondata.core.carbon.metadata.blocklet.datachunk.DataChunk;
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema;
+import 
org.apache.carbondata.core.datastorage.store.columnar.ColumnGroupModel;
+import 
org.apache.carbondata.core.datastorage.store.compression.ValueCompressionModel;
+import 
org.apache.carbondata.core.datastorage.store.filesystem.LocalCarbonFile;
+import org.apache.carbondata.core.datastorage.store.impl.FileFactory;
+import org.apache.carbondata.core.keygenerator.mdkey.NumberCompressor;
+import org.apache.carbondata.core.metadata.ValueEncoderMeta;
+import org.apache.carbondata.scan.model.QueryDimension;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.glassfish.grizzly.memory.HeapBuffer;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
 import org.junit.Test;
+import org.pentaho.di.core.exception.KettleException;
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.ArrayList;
+import java.util.List;
+import static junit.framework.TestCase.*;
 
-public class CarbonUtilTest extends TestCase {
+public class CarbonUtilTest {
 
-  @Test public void testGetBitLengthForDimensionGiveProperValue() {
-int[] cardinality = { 10, 1, 1, 1, 2, 3 };
-int[] dimensionBitLength =
-CarbonUtil.getDimensionBitLength(cardinality, new int[] { 1, 1, 3, 
1 });
-int[] expectedOutPut = { 8, 8, 14, 2, 8, 8 };
-for (int i = 0; i < dimensionBitLength.length; i++) {
-  assertEquals(expectedOutPut[i], dimensionBitLength[i]);
+@BeforeClass
+public static void setUp() throws Exception{
+new 
File("../core/src/test/resources/testFile.txt").createNewFile();
+new File("../core/src/test/resources/testDatabase").mkdirs();
+
+}
+
+@Test
+public void testGetBitLengthForDimensionGiveProperValue() {
+int[] cardinality = {200, 1, 1, 1, 10, 3};
+int[] dimensionBitLength =
+CarbonUtil.getDimensionBitLength(cardinality, new int[]{1, 
1, 3, 1});
+int[] expectedOutPut = {8, 8, 14, 2, 8, 8};
+for (int i = 0; i < dimensionBitLength.length; i++) {
+assertEquals(expectedOutPut[i], dimensionBitLength[i]);
+}
+}
+
+@Test(expected = IOException.class)
+public void testCloseStreams() throws IOException {
+FileReader stream = new 
FileReader("../core/src/test/resources/testFile.txt");
+BufferedReader br = new BufferedReader(stream);
+CarbonUtil.closeStreams(br);
+br.ready();
+}
+
+@Test
+public void testToGetCardinality() {
+int result = CarbonUtil.getIncrementedCardinality(10);
--- End diff --

add more checks by passing more different values here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #269: [CARBONDATA-345] improve code-covera...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/269#discussion_r87446559
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/lcm/locks/ZooKeeperLockingTest.java
 ---
@@ -41,103 +41,103 @@
  */
 public class ZooKeeperLockingTest {
--- End diff --

why this testcase is modified?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #269: [CARBONDATA-345] improve code-covera...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/269#discussion_r87446084
  
--- Diff: pom.xml ---
@@ -6,9 +6,7 @@
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at
--- End diff --

why updation is neede in pom file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #269: [CARBONDATA-345] improve code-covera...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/269#discussion_r87444589
  
--- Diff: 
core/src/test/java/org/apache/carbondata/core/cache/dictionary/DictionaryByteArrayWrapperTest.java
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.cache.dictionary;
+
+import org.junit.Before;
+import org.junit.Test;
+
+public class DictionaryByteArrayWrapperTest {
+
+DictionaryByteArrayWrapper dictionaryByteArrayWrapper;
+
+@Before
+public void setup() {
+byte[] data = "Rahul".getBytes();
+dictionaryByteArrayWrapper = new DictionaryByteArrayWrapper(data);
--- End diff --

Please include test for another constructor which has xxHash32 also



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #267: [CARBONDATA-340] implement test case...

2016-11-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/267#discussion_r87442533
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/load/LoadMetadataDetails.java ---
@@ -150,7 +150,7 @@ public String getLoadStartTime() {
* return loadStartTime
* @return
*/
-  public long getLoadStartTimeAsLong() {
+  public Long getLoadStartTimeAsLong() {
--- End diff --

Why it is required to change to `Long`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #313: [CARBONDATA-405]Fixed Data load fail...

2016-11-10 Thread ravipesala
GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/313

[CARBONDATA-405]Fixed Data load fail if dataframe is created with LONG 
datatype column

If the dataframe schema has long datatype then carbon table creation is 
failing because it cannot convert long type to supported bigint type. Same is 
fixed in this PR

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
dataframe-longtype-issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #313


commit 78d9fe6f32fc010c6dd6115444872abd3b53338d
Author: ravipesala 
Date:   2016-11-10T16:46:59Z

Fixed Data load fail if dataframe is created with LONG datatype column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread Jacky Li
+1 binding

Regards,
Jacky

---Original---
From: "Aniket Adnaik"
Date: 2016/11/10 14:43:49
To: 
"dev";"chenliang613";
Subject: Re: [VOTE] Apache CarbonData 0.2.0-incubating release


+1

Regards,
Aniket

On 9 Nov 2016 3:17 p.m., "Liang Chen"  wrote:

> Hi all,
>
> I submit the CarbonData 0.2.0-incubating to your vote.
>
> Release Notes:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220=12337896
>
> Staging Repository:
> https://repository.apache.org/content/repositories/
> orgapachecarbondata-1006
>
> Git Tag:
> carbondata-0.2.0-incubating
>
> Please vote to approve this release:
> [ ] +1 Approve the release
> [ ] -1 Don't approve the release (please provide specific comments)
>
> This vote will be open for at least 72 hours. If this vote passes (we need
> at least 3 binding votes, meaning three votes from the PPMC), I will
> forward to gene...@incubator.apache.org for  the IPMC votes.
>
> Here is my vote : +1 (binding)
>
> Regards
> Liang
>

[GitHub] incubator-carbondata pull request #296: [CARBONDATA-382]Like Filter Query Op...

2016-11-10 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/296#discussion_r87425199
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterExpressionProcessor.java
 ---
@@ -286,6 +289,13 @@ private FilterResolverIntf 
getFilterResolverBasedOnExpressionType(
   return new RowLevelFilterResolverImpl(expression, 
isExpressionResolve, true,
   tableIdentifier);
 }
+if (currentCondExpression.getFilterExpressionType() == 
ExpressionType.CONTAINS
--- End diff --

For dictionary column do we need to create row level expression?? I think 
for dictionary column creating a include filter for like query will  good 
enough, because we have the dictionary values we can search in dictionary to 
get all the valid values and we can apply filter. Please correct me if i am 
wrong:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-406) Empty Folder is created when data load from dataframe

2016-11-10 Thread Babulal (JIRA)
Babulal created CARBONDATA-406:
--

 Summary: Empty Folder is created when data load from dataframe
 Key: CARBONDATA-406
 URL: https://issues.apache.org/jira/browse/CARBONDATA-406
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 0.1.0-incubating
Reporter: Babulal
Priority: Trivial


Load the data from dataframe to carbon table with tempCSV=false option .
Load is success but emtyFolder is getting created  in HDFS 
Cluster size= 3 node .
Type:- Stanalone Spark 


Steps
 val customSchema = StructType(Array(StructField("imei", StringType, true), 
   StructField("deviceInformationId", IntegerType, true),StructField("mac", 
StringType, true),StructField("productdate", TimestampType , true),
StructField("updatetime", TimestampType, true),StructField("gamePointId", 
DoubleType, true),StructField("contractNumber", DoubleType, true)   ));


val df = cc.read.format("com.databricks.spark.csv").option("header", 
"false").schema(customSchema).load("/opt/data/xyz/100_default_date_11_header.csv");

Start data loading 
scala> df.write.format("carbondata").option("tableName","mycarbon2").save();


Check Logs

leges:{}, groupPrivileges:null, rolePrivileges:null))
INFO  10-11 23:52:44,005 - Creating directory if it doesn't exist: 
hdfs://10.18.102.236:54310/opt/Carbon/Spark/spark/bin/null/bin/carbonshellstore/hivemetadata/mycarbon4
AUDIT 10-11 23:52:44,037 - [BLR107781][root][Thread-1]Table created with 
Database name [default] and Table name [mycarbon4]
INFO  10-11 23:52:44,040 - Successfully able to get the table metadata file lock


In the HDFS this Path is empty 
hdfs://10.18.102.236:54310/opt/Carbon/Spark/spark/bin/null/bin/carbonshellstore/hivemetadata/mycarbon4

Actual Store location is :- hdfs://10.18.102.236:54310/opt/Carbon/mystore

Expect :- Empty folder should not be created. . It seems that it is created in  
SPARK_HOME/bin  . 
SPARK_HOME is /opt/Carbon/Spark/spark/bin




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: GC problem and performance refine problem

2016-11-10 Thread Kumar Vishal
Hi Anning Luo,

Can u please provide below details.

1.Create table ddl.
2.Number of node in you cluster setup.
3. Number of executors per node.
4. Query statistics.

Please find my comments in bold.

Problem:
1.  GC problem. We suffer a 20%~30% GC time for
some task in first stage after a lot of parameter refinement. We now use G1
GC in java8. GC time will double if use CMS. The main GC time is spent on
young generation GC. Almost half memory of young generation will be copy to
old generation. It seems lots of object has a long life than GC period and
the space is not be reuse(as concurrent GC will release it later). When we
use a large Eden(>=1G for example), once GC time will be seconds. If set
Eden little(256M for example), once GC time will be hundreds milliseconds,
but more frequency and total is still seconds. Is there any way to lessen
the GC time? (We don’t consider the first query and second query in this
case.)

*How many node are present in your cluster setup?? If nodes are less please
reduce the number of executors per node.*

2.  Performance refine problem. Row number after
being filtered is not uniform. Some node maybe heavy. It spend more time
than other node. The time of one task is 4s ~ 16s. Is any method to refine
it?

3.  Too long time for first and second query. I
know dictionary and some index need to be loaded for the first time. But
after I trying use query below to preheat it, it still spend a lot of time.
How could I preheat the query correctly?
select Aarray, a, b, c… from Table1 where Aarray is
not null and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

*Currently we are working on first time query improvement. For now you can
run select count(*) or count(column), so all the blocks get loaded and then
you can run the actual query.*

4. Any other suggestion to lessen the query time?


Some suggestion:
The log by class QueryStatisticsRecorder give me a good means
to find the neck bottle, but not enough. There still some metric I think is
very useful:
1. filter ratio. i.e.. not only result_size but also the origin
size so we could know how many data is filtered.
2. IO time. The scan_blocks_time is not enough. If it is high,
we know somethings wrong, but not know what cause that problem. The real IO
time for data is not be provided. As there may be several file for one
partition, know the program slow is caused by datanode or executor itself
give us intuition to find the problem.
3. The TableBlockInfo for task. I log it by myself when
debugging. It tell me how many blocklets is locality. The spark web monitor
just give a locality level, but may be only one blocklet is locality.


-Regards
Kumar Vishal

On Thu, Nov 10, 2016 at 8:55 PM, An Lan  wrote:

> Hi,
>
> We are using carbondata to build our table and running query in
> CarbonContext. We have some performance problem during refining the system.
>
> *Background*:
>
> *cluster*:  100 executor,5 task/executor, 10G
> memory/executor
>
> *data*:  60+GB(per one replica) as carbon data
> format, 600+MB/file * 100 file, 300+columns, 300+million rows
>
> *sql example:*
>
>   select A,
>
>   sum(a),
>
>   sum(b),
>
>   sum(c),
>
>   …( extra 100 aggregation like
> sum(column))
>
>   from Table1 LATERAL VIEW
> explode(split(Aarray, ‘*;*’)) ATable AS A
>
>   where A is not null and d > “ab:c-10”
> and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A
>
> *target query time*:   <10s
>
> *current query time*: 15s ~ 25s
>
> *scene:* OLAP system. <100 queries every day.
> Concurrency number is not high. Most time cpu is idle, so this service will
> run with other program. The service will run for long time. We could not
> occupy a very large memory for every executor.
>
> *refine*:  I have build index and dictionary on
> d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a,
> b, c, …100+ columns). And make sure there is one segment for total data. I
> have open the speculation(quantile=0.5, interval=250, multiplier=1.2).
>
> Time is mainly spent on first stage before shuffling. As 95% data will be
> filtered out, the shuffle process spend little time. In first stage, most
> task complete in less than 10s. But there still be near 50 tasks longer
> than 10s. Max task time in one query may be 12~16s.
>
> *Problem:*
>
> 1.  GC problem. We suffer a 20%~30% GC time for some task in first
> stage after a lot of parameter refinement. We now use G1 GC in java8. GC
> 

[GitHub] incubator-carbondata pull request #312: [CARBONDATA-404] Fixing dataframe sa...

2016-11-10 Thread ravipesala
GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/312

[CARBONDATA-404] Fixing dataframe save when loading in cluster mode.

Currently dataframe save writes temp csv in local folder so it fails in 
cluster mode. This PR changes the temp csv location to store path.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
dataframe-csv-issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #312


commit ea94f9aeebe026c29c6c4f976ef268bb29517da7
Author: ravipesala 
Date:   2016-11-10T15:03:32Z

Fixing dataframe save when loading in cluster mode.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-403) add example for data load without using kettle

2016-11-10 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-403:
---

 Summary: add example for data load without using kettle
 Key: CARBONDATA-403
 URL: https://issues.apache.org/jira/browse/CARBONDATA-403
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li


add example for data load without using kettle



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #311: add example for data load without us...

2016-11-10 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/311

add example for data load without using kettle

In this PR, example SQL and dataframe usage is added for loading data 
without kettle

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata no-kettle-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #311


commit 65d5dec6af97be79924a7c43a48c7a7baa540c7b
Author: jackylk 
Date:   2016-11-10T15:20:52Z

add no-kettle loading example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-404) Data loading from DataFrame to carbon table is FAILED

2016-11-10 Thread Babulal (JIRA)
Babulal created CARBONDATA-404:
--

 Summary: Data loading from DataFrame to carbon table is FAILED
 Key: CARBONDATA-404
 URL: https://issues.apache.org/jira/browse/CARBONDATA-404
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 0.1.0-incubating
Reporter: Babulal


Data loading FAILED when   Loading data from DataFrame with tempCSV option 
=true(Default option ) in 3 Node cluster .

Steps
 val customSchema = StructType(Array(StructField("imei", StringType, true), 
   StructField("deviceInformationId", IntegerType, true),StructField("mac", 
StringType, true),StructField("productdate", TimestampType , true),
StructField("updatetime", TimestampType, true),StructField("gamePointId", 
DoubleType, true),StructField("contractNumber", DoubleType, true)   ));


val df = cc.read.format("com.databricks.spark.csv").option("header", 
"false").schema(customSchema).load("/opt/data/xyz/100_default_date_11_header.csv");

Start data loading 
scala> df.write.format("carbondata").option("tableName","mycarbon2").save();
INFO  10-11 23:24:35,970 - main Query [
  CREATE TABLE IF NOT EXISTS DEFAULT.MYCARBON2
  (IMEI STRING, DEVICEINFORMATIONID INT, MAC STRING, PRODUCTDATE 
TIMESTAMP, UPDATETIME TIMESTAMP, GAMEPOINTID DOUBLE, CONTRACTNUMBER DOUBLE)
  STORED BY 'ORG.APACHE.CARBONDATA.FORMAT'
  ]
INFO  10-11 23:24:35,977 - Parsing command:
  CREATE TABLE IF NOT EXISTS default.mycarbon2
  (imei STRING, deviceInformationId INT, mac STRING, productdate 
TIMESTAMP, updatetime TIMESTAMP, gamePointId DOUBLE, contractNumber DOUBLE)
  STORED BY 'org.apache.carbondata.format'

INFO  10-11 23:24:35,978 - Parse Completed
INFO  10-11 23:24:36,227 - main Query [
  LOAD DATA INPATH './TEMPCSV'
  INTO TABLE DEFAULT.MYCARBON2
  OPTIONS ('FILEHEADER' = 
'IMEI,DEVICEINFORMATIONID,MAC,PRODUCTDATE,UPDATETIME,GAMEPOINTID,CONTRACTNUMBER')
  ]
INFO  10-11 23:24:36,233 - Successfully able to get the table metadata file lock
AUDIT 10-11 23:24:36,234 - [BLR107781][root][Thread-1]Dataload failed for 
default.mycarbon2. The input file does not exist: ./tempCSV
INFO  10-11 23:24:36,234 - main Successfully deleted the lock file 
/tmp/default/mycarbon2/meta.lock
INFO  10-11 23:24:36,234 - Table MetaData Unlocked Successfully after data load
org.apache.carbondata.processing.etl.DataLoadingException: The input file does 
not exist: ./tempCSV
at 
org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)


CSV DATA

1AA1,1,Mikaa1,2015-01-01 11:00:00,2015-01-01 13:00:00,198,260
1AA2,3,Mikaa2,2015-01-02 12:00:00,2015-01-01 14:00:00,278,230
1AA3,1,Mikaa1,2015-01-03 13:00:00,2015-01-01 15:00:00,2556,1
1AA4,10,Mikaa2,2015-01-04 14:00:00,2015-01-01 16:00:00,640,254
1AA5,10,Mikaa,2015-01-05 15:00:00,2015-01-01 17:00:00,980,256
1AA6,10,Mikaa,2015-01-06 16:00:00,2015-01-01 18:00:00,1,2378
1AA7,10,Mikaa,2015-01-07 17:00:00,2015-01-01 19:00:00,96,234
1AA8,9,max,2015-01-08 18:00:00,2015-01-01 20:00:00,89,236




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


GC problem and performance refine problem

2016-11-10 Thread An Lan
Hi,

We are using carbondata to build our table and running query in
CarbonContext. We have some performance problem during refining the system.

*Background*:

*cluster*:  100 executor,5 task/executor, 10G
memory/executor

*data*:  60+GB(per one replica) as carbon data
format, 600+MB/file * 100 file, 300+columns, 300+million rows

*sql example:*

  select A,

  sum(a),

  sum(b),

  sum(c),

  …( extra 100 aggregation like
sum(column))

  from Table1 LATERAL VIEW
explode(split(Aarray, ‘*;*’)) ATable AS A

  where A is not null and d > “ab:c-10”
and d < “h:0f3s” and e!=10 and f=22 and g=33 and h=44 GROUP BY A

*target query time*:   <10s

*current query time*: 15s ~ 25s

*scene:* OLAP system. <100 queries every day.
Concurrency number is not high. Most time cpu is idle, so this service will
run with other program. The service will run for long time. We could not
occupy a very large memory for every executor.

*refine*:  I have build index and dictionary on
d, e, f, g, h and build dictionary on all other aggregation columns(i.e. a,
b, c, …100+ columns). And make sure there is one segment for total data. I
have open the speculation(quantile=0.5, interval=250, multiplier=1.2).

Time is mainly spent on first stage before shuffling. As 95% data will be
filtered out, the shuffle process spend little time. In first stage, most
task complete in less than 10s. But there still be near 50 tasks longer
than 10s. Max task time in one query may be 12~16s.

*Problem:*

1.  GC problem. We suffer a 20%~30% GC time for some task in first
stage after a lot of parameter refinement. We now use G1 GC in java8. GC
time will double if use CMS. The main GC time is spent on young generation
GC. Almost half memory of young generation will be copy to old generation.
It seems lots of object has a long life than GC period and the space is not
be reuse(as concurrent GC will release it later). When we use a large
Eden(>=1G for example), once GC time will be seconds. If set Eden
little(256M for example), once GC time will be hundreds milliseconds, but
more frequency and total is still seconds. Is there any way to lessen the
GC time? (We don’t consider the first query and second query in this case.)

2.   Performance refine problem. Row number after being filtered is
not uniform. Some node maybe heavy. It spend more time than other node. The
time of one task is 4s ~ 16s. Is any method to refine it?

3.   Too long time for first and second query. I know dictionary
and some index need to be loaded for the first time. But after I trying use
query below to preheat it, it still spend a lot of time. How could I
preheat the query correctly?

  select Aarray, a, b, c… from Table1 where Aarray is not null
and d = “sss” and e !=22 and f = 33 and g = 44 and h = 55

4.   Any other suggestion to lessen the query time?



Some suggestion:

The log by class QueryStatisticsRecorder give me a good means
to find the neck bottle, but not enough. There still some metric I think is
very useful:

1. filter ratio. i.e.. not only result_size but also the origin
size so we could know how many data is filtered.

2. IO time. The scan_blocks_time is not enough. If it is high,
we know somethings wrong, but not know what cause that problem. The real IO
time for data is not be provided. As there may be several file for one
partition, know the program slow is caused by datanode or executor itself
give us intuition to find the problem.

3. The TableBlockInfo for task. I log it by myself when
debugging. It tell me how many blocklets is locality. The spark web monitor
just give a locality level, but may be only one blocklet is locality.


-

Anning Luo

*HULU*

Email: anning@hulu.com

lanan...@gmail.com


join mail list

2016-11-10 Thread Anning Luo
As above



join mail list

2016-11-10 Thread Anning Luo
 



[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2] Data load integration...

2016-11-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/263


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #290: [CARBONDATA-371] Write unit test for...

2016-11-10 Thread harmeetsingh0013
Github user harmeetsingh0013 closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/290


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread QiangCai
+1



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/VOTE-Apache-CarbonData-0-2-0-incubating-release-tp2823p2836.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #310: [CARBONDATA-401][WIP] One Pass Load

2016-11-10 Thread lion-x
GitHub user lion-x opened a pull request:

https://github.com/apache/incubator-carbondata/pull/310

[CARBONDATA-401][WIP] One Pass Load

# Why raise this PR?

# How to do?

- [ ] Trans option useOnePass in Load Statement into 
CarbonCSVBasedSeqGenStep.java

- [ ] 

- [ ] 

- [ ] 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lion-x/incubator-carbondata onePass

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #310


commit 611b0135425691eeb8fbc19469485834d23b2008
Author: lion-x 
Date:   2016-11-10T09:08:42Z

transonepass




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---