Cause of Compaction?

2017-10-09 Thread sunerhan1...@sina.com
hello,

My application has running for a long time,constantly update and insert table.

I got an strange exception  like following:

ERROR command.ProjectForUpdateCommand$: main Update operation passed. Exception 
in Horizontal Compaction. Please check 
logs.org.apache.spark.sql.execution.command.HorizontalCompactionException: 
Horizontal Update Compaction Failed for [e_carbon.prod_inst_his_c]. Compaction 
failed. Please check logs for more info. Exception in compaction 
java.lang.Exception  : Compaction Failure in Merger Rdd.

Can anyone explain what may cuase this exception?



sunerhan1...@sina.com


Does index be used when doing "join" operation between a big table and a small table?

2017-10-09 Thread sunerhan1...@sina.com
hello,

I have 2 tables need to do "join" operation by their primary key, the primary 
key of these 2 tables are both type "String".

There are 200 million pieces of data in the big table and only 20 throusand 
pieces of data in the small table.

This join operation is quite slow.
I want to know does index be used when doing "join" operation between a big 
table and a small table?

And how to confirm whether index be used?





sunerhan1...@sina.com


[1.2.0-SNAPSHOT]-delete problem

2017-07-12 Thread sunerhan1...@sina.com
hello,
 already created a jira issue,please check:
 https://issues.apache.org/jira/browse/CARBONDATA-1302



sunerhan1...@sina.com


problem with branch-1.1

2017-06-23 Thread sunerhan1...@sina.com
hello,
I tried to use branch-1.1 under hdp2.6.0-spark2.1.0 and met so many 
problems.
1.build: use "mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 
-Phadoop-2.7.2" and 
while compiling core module,throw class not found error,like 
org.apache.thrift.TBase,
modify pom.xml in core and add libthrift dependency,compile 
successfully.
2.running in spark-shell like ;  
spark-shell --jars 
carbondata_2.11-1.1.1-SNAPSHOT-shade-hadoop2.7.2.jar
scala>import org.apache.spark.sql.SparkSession
scala>import org.apache.spark.sql.CarbonSession._
scala>val cc = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(hdfs path)
and got error:
java.lang.NoClassDefFoundError: 
org/apache/spark/sql/catalyst/CatalystConf
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
... 52 elided
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 59 more
check spark-catalyst.jar and didn't found class CatalystConf
3.use apache-spark catalyst jars and rerun with 
spark-shell --jars 
carbondata_2.11-1.1.1-SNAPSHOT-shade-hadoop2.7.2.jar,spark-catalyst_2.11-2.1.0.jar
and got error message like:
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
... 52 elided
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 59 more



sunerhan1...@sina.com


when plan to implemnt merge operation

2017-05-25 Thread sunerhan1...@sina.com
hello,
my team is trying to implement merge operation,
merge   scenario like the following:
 compare records in two tables(same structure,different amout of 
records)and modify big one ,
1. if small.id=big.id and small.date

Delete ERROR

2017-05-22 Thread sunerhan1...@sina.com
as subquery is not supported in spark1.6+carbon1.1.0,I decide to prefetch  id 
values in scala list:
spark-shell>>
var temp1=cc.sql("select  id  from table where limit 
1000").select("id").rdd.map(r => r(0)).collect().mkString(",")  
cc.sql(s"""delete from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 where 
ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show
and get error info like following:
WARN ExecutorAllocationManager: No stages are running, but numRunningTasks 
!= 0
AUDIT deleteExecution$: [HETL032][e_carbon][Thread-1]Delete data operation 
is failed for table
ERROR deleteExecution$: main Delete data operation is failed due to failure 
in creating delete delta file for segment : null block : null
after deleting,i run:
   cc.sql(s"""select count(*) from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 
where ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show
  the result is 1000  
It only delete success maximun at 200 a batch,and took about 1min which is too 
slow.
SO my question is how to tuning the performance to make the batch larger and 
delete faster


sunerhan1...@sina.com


classnot found exception

2017-05-18 Thread sunerhan1...@sina.com
i'm using 
spark2.1+carbon1.1(https://github.com/apache/carbondata/tree/apache-carbondata-1.1.0),and
 get class not found exception.and this class only exists in spark1.X import by 
CodeGenFactory.scala
build jar using :mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 
-Phadoop-2.7.2

below are error messages(also in file attached):
scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://192.168.14.78:8020/apps/hive/guoht/qqdatast
17/05/18 10:42:41 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.
carbon: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.CarbonSession@2cfd9b0a

scala> carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name 
string, city string, age Int) STORED BY 'carbondata'")
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
  at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
  at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
  at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
  at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
  at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
  ... 50 elided
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  ... 57 more



sunerhan1...@sina.com


classnot found exception

2017-05-17 Thread sunerhan1...@sina.com
i'm using spark2.1+carbon1.1,and get class not found exception.and this class 
only exists in spark1.X import by CodeGenFactory.scala
build jar using :mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 
-Phadoop-2.7.2




sunerhan1...@sina.com


Re: Re: why sort_columns?

2017-05-14 Thread sunerhan1...@sina.com
I have replied this question in another topic session as below :

First, please check this doc: http://carbondata.apache.org/
useful-tips-on-carbondata.html, see if can help you to understand
CarbonData's index usage.

Like you mentioned that 1.2 will introduce sort columns feature to help
users to more easily specify which columns need to build index. for example
: "create table(c1...c7)  tblproperties('sort_columns' = 'c7,c3')".

As you know , before 1.1 CarbonData by default builds MDK index as per the
order of columns in creation table, the feature will be kept in future
versions, so you can still use "create table (c1,c2,...c7")" to build
index.

HTH.

Regards
Liang

2017-05-14 19:31 GMT-07:00 sunerhan1...@sina.com <sunerhan1...@sina.com>:

> hi community,
>
> since we already have many rmdb sqls scripts,we don't want to change them
> too much when migrating to carbon.
> suppose we already have sql script like:
>"create table (c1,c2,...c7")",
> If we want to change column order to shift most often used column ahead
> when query data,i think it's better change it this way like:
>   "create table(c1,c2,c7,c4..c6,c3)
> rather
>   "create table(c1...c7) tblproperties('sort_columns' = 'c7,c3')"
> because the former way just reordering columns without adding extra
> settings,which is more readable to thos who is familiar with rmdb but not
> with carbon.
>
>
>
> sunerhan1...@sina.com
>



-- 
Regards
Liang


why sort_columns?

2017-05-14 Thread sunerhan1...@sina.com
hi community,

since we already have many rmdb sqls scripts,we don't want to change them too 
much when migrating to carbon.
suppose we already have sql script like:
   "create table (c1,c2,...c7")",
If we want to change column order to shift most often used column ahead when 
query data,i think it's better change it this way like:
  "create table(c1,c2,c7,c4..c6,c3) 
rather
  "create table(c1...c7) tblproperties('sort_columns' = 'c7,c3')" 
because the former way just reordering columns without adding extra 
settings,which is more readable to thos who is familiar with rmdb but not with 
carbon.



sunerhan1...@sina.com