problem with branch-1.1

2017-06-23 Thread sunerhan1...@sina.com
hello,
I tried to use branch-1.1 under hdp2.6.0-spark2.1.0 and met so many 
problems.
1.build: use "mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 
-Phadoop-2.7.2" and 
while compiling core module,throw class not found error,like 
org.apache.thrift.TBase,
modify pom.xml in core and add libthrift dependency,compile 
successfully.
2.running in spark-shell like ;  
spark-shell --jars 
carbondata_2.11-1.1.1-SNAPSHOT-shade-hadoop2.7.2.jar
scala>import org.apache.spark.sql.SparkSession
scala>import org.apache.spark.sql.CarbonSession._
scala>val cc = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(hdfs path)
and got error:
java.lang.NoClassDefFoundError: 
org/apache/spark/sql/catalyst/CatalystConf
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
... 52 elided
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 59 more
check spark-catalyst.jar and didn't found class CatalystConf
3.use apache-spark catalyst jars and rerun with 
spark-shell --jars 
carbondata_2.11-1.1.1-SNAPSHOT-shade-hadoop2.7.2.jar,spark-catalyst_2.11-2.1.0.jar
and got error message like:
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
... 52 elided
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 59 more



sunerhan1...@sina.com


[jira] [Created] (CARBONDATA-1219) Documentation - not supported high.cardinality.row.count.percentage

2017-06-23 Thread Gururaj Shetty (JIRA)
Gururaj Shetty created CARBONDATA-1219:
--

 Summary: Documentation - not supported 
high.cardinality.row.count.percentage
 Key: CARBONDATA-1219
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1219
 Project: CarbonData
  Issue Type: Bug
Reporter: Gururaj Shetty
Assignee: Srigopal Mohanty
Priority: Minor


The parameter high.cardinality.row.count.percentage should be removed and all 
the reference also needs to be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: problem with branch-1.1

2017-06-23 Thread Erlu Chen
Hi,

Please try mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0
-Phadoop-2.7.2 with hadoop2.7.2 and spark 2.

I have just tested, it ok to compile.

[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent  SUCCESS [  1.657
s]
[INFO] Apache CarbonData :: Common  SUCCESS [  1.870
s]
[INFO] Apache CarbonData :: Core .. SUCCESS [ 25.003
s]
[INFO] Apache CarbonData :: Processing  SUCCESS [  1.941
s]
[INFO] Apache CarbonData :: Hadoop  SUCCESS [  2.017
s]
[INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 20.622
s]
[INFO] Apache CarbonData :: Spark2  SUCCESS [ 39.956
s]
[INFO] Apache CarbonData :: Spark Common Test . SUCCESS [  4.024
s]
[INFO] Apache CarbonData :: Assembly .. SUCCESS [  3.400
s]
[INFO] Apache CarbonData :: Spark2 Examples ... SUCCESS [  9.718
s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 01:50 min
[INFO] Finished at: 2017-06-23T17:55:28+08:00
[INFO] Final Memory: 83M/860M
[INFO]

bogon:carbondata erlu$ git branch
* branch-1.1

Regrads.
Chenerlu




--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/problem-with-branch-1-1-tp16004p16016.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


[jira] [Created] (CARBONDATA-1220) Decimal values are not displayed correctly in presto

2017-06-23 Thread Geetika Gupta (JIRA)
Geetika Gupta created CARBONDATA-1220:
-

 Summary: Decimal values are not displayed correctly in presto
 Key: CARBONDATA-1220
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1220
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Reporter: Geetika Gupta
Priority: Minor
 Attachments: decimaldata.csv

I created a table in carbondata having decimal values, when I tried to display 
all the rows in presto, the scale of the values gets changed.

Below are the details of the query:

Create table in carbondata:
create table decimalOperatorCheck(name String, ids Decimal(10,2)) stored by 
'carbondata';

Load data:
load data inpath 'hdfs://localhost:54311/testFiles/decimaldata.csv' into table 
decimalOperatorCheck options('delimiter'=',','fileheader'='name,ids');

Output in Carbondata:

0: jdbc:hive2://localhost:1> select * from decimaloperatorcheck;
+-+---+--+
|  name   |ids|
+-+---+--+
| Alex| 123.45|
| Josh| 233.34|
| Justin  | 11.90 |
| Ryan| 12345.56  |
| name| NULL  |
+-+---+--+
5 rows selected (21.983 seconds)

Output in presto:

presto:sparkdata> select * from decimaloperatorcheck;
  name  |  ids   
+
 Alex   | 1.23   
 Josh   | 2.33   
 Justin | 0.12   
 Ryan   | 123.46 
 name   | NULL   
(5 rows)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1221) DOCUMENTATION - Remove unsupported parameter

2017-06-23 Thread Gururaj Shetty (JIRA)
Gururaj Shetty created CARBONDATA-1221:
--

 Summary: DOCUMENTATION - Remove unsupported parameter
 Key: CARBONDATA-1221
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1221
 Project: CarbonData
  Issue Type: Bug
Reporter: Gururaj Shetty
Priority: Minor


The following two parameter is not supported anymore need to be removed from 
the document:

carbon.inmemory.record.size
no.of.cores.to.load.blocks.in.driver



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1222) Residual files created from Update are not deleted after clean operation

2017-06-23 Thread panner selvam velmyl (JIRA)
panner selvam velmyl created CARBONDATA-1222:


 Summary: Residual files created from Update are not deleted after 
clean operation
 Key: CARBONDATA-1222
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1222
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Reporter: panner selvam velmyl
Priority: Minor


Spark - sql:

1.Create a table
create table t_carbn31(item_code string,item_name1 string) stored by 
'carbondata'

2.Losd Data
insert into t_carbn31 select 'a1','Phone';

3.Update the table
update t_carbn01 set(item_name1)=('Mobile') where item_code ='a1'

4.Run clean files on the table
clean files for table t_carbn01

Expected output: clean files should remove the residual carbondata and delete 
files
Actual output : Residual files are not cleaned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1223) Fixing empty file creation in batch sort loading

2017-06-23 Thread dhatchayani (JIRA)
dhatchayani created CARBONDATA-1223:
---

 Summary: Fixing empty file creation in batch sort loading
 Key: CARBONDATA-1223
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1223
 Project: CarbonData
  Issue Type: Bug
Reporter: dhatchayani
Assignee: dhatchayani






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1224) Going out of memory if more segments are compacted at once in V3 format

2017-06-23 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created CARBONDATA-1224:
---

 Summary: Going out of memory if more segments are compacted at 
once in V3 format
 Key: CARBONDATA-1224
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1224
 Project: CarbonData
  Issue Type: Bug
Reporter: Ravindra Pesala


In V3 format we read the whole blocklet at once to memory in order save IO 
time. But it turns out  to be costlier in case of parallel reading of more 
carbondata files. 
For example if we need to compact 50 segments then compactor need to open the 
readers on all the 50 segments to do merge sort. But the memory consumption is 
too high if each reader reads whole blocklet to the memory and there is high 
chances of going out of memory.

Solution:
In this type of scenarios we can introduce new readers for V3 to read the data 
page by page instead of reading whole blocklet at once to reduce the memory 
footprint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)