Cause of Compaction?
hello, My application has running for a long time,constantly update and insert table. I got an strange exception like following: ERROR command.ProjectForUpdateCommand$: main Update operation passed. Exception in Horizontal Compaction. Please check logs.org.apache.spark.sql.execution.command.HorizontalCompactionException: Horizontal Update Compaction Failed for [e_carbon.prod_inst_his_c]. Compaction failed. Please check logs for more info. Exception in compaction java.lang.Exception : Compaction Failure in Merger Rdd. Can anyone explain what may cuase this exception? sunerhan1...@sina.com
Does index be used when doing "join" operation between a big table and a small table?
hello, I have 2 tables need to do "join" operation by their primary key, the primary key of these 2 tables are both type "String". There are 200 million pieces of data in the big table and only 20 throusand pieces of data in the small table. This join operation is quite slow. I want to know does index be used when doing "join" operation between a big table and a small table? And how to confirm whether index be used? sunerhan1...@sina.com
[1.2.0-SNAPSHOT]-delete problem
hello, already created a jira issue,please check: https://issues.apache.org/jira/browse/CARBONDATA-1302 sunerhan1...@sina.com
problem with branch-1.1
hello, I tried to use branch-1.1 under hdp2.6.0-spark2.1.0 and met so many problems. 1.build: use "mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 -Phadoop-2.7.2" and while compiling core module,throw class not found error,like org.apache.thrift.TBase, modify pom.xml in core and add libthrift dependency,compile successfully. 2.running in spark-shell like ; spark-shell --jars carbondata_2.11-1.1.1-SNAPSHOT-shade-hadoop2.7.2.jar scala>import org.apache.spark.sql.SparkSession scala>import org.apache.spark.sql.CarbonSession._ scala>val cc = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(hdfs path) and got error: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf at org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) at org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 52 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 59 more check spark-catalyst.jar and didn't found class CatalystConf 3.use apache-spark catalyst jars and rerun with spark-shell --jars carbondata_2.11-1.1.1-SNAPSHOT-shade-hadoop2.7.2.jar,spark-catalyst_2.11-2.1.0.jar and got error message like: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf at org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) at org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 52 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 59 more sunerhan1...@sina.com
when plan to implemnt merge operation
hello, my team is trying to implement merge operation, merge scenario like the following: compare records in two tables(same structure,different amout of records)and modify big one , 1. if small.id=big.id and small.date
Delete ERROR
as subquery is not supported in spark1.6+carbon1.1.0,I decide to prefetch id values in scala list: spark-shell>> var temp1=cc.sql("select id from table where limit 1000").select("id").rdd.map(r => r(0)).collect().mkString(",") cc.sql(s"""delete from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 where ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show and get error info like following: WARN ExecutorAllocationManager: No stages are running, but numRunningTasks != 0 AUDIT deleteExecution$: [HETL032][e_carbon][Thread-1]Delete data operation is failed for table ERROR deleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null after deleting,i run: cc.sql(s"""select count(*) from e_carbon.V_ORDER_ITEM_PROC_ATTR_CARBON4 where ORDER_ITEM_PROC_ATTR_ID in ( $temp1) """).show the result is 1000 It only delete success maximun at 200 a batch,and took about 1min which is too slow. SO my question is how to tuning the performance to make the batch larger and delete faster sunerhan1...@sina.com
classnot found exception
i'm using spark2.1+carbon1.1(https://github.com/apache/carbondata/tree/apache-carbondata-1.1.0),and get class not found exception.and this class only exists in spark1.X import by CodeGenFactory.scala build jar using :mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 -Phadoop-2.7.2 below are error messages(also in file attached): scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://192.168.14.78:8020/apps/hive/guoht/qqdatast 17/05/18 10:42:41 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@2cfd9b0a scala> carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name string, city string, age Int) STORED BY 'carbondata'") java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf at org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) at org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 50 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 57 more sunerhan1...@sina.com
classnot found exception
i'm using spark2.1+carbon1.1,and get class not found exception.and this class only exists in spark1.X import by CodeGenFactory.scala build jar using :mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 -Phadoop-2.7.2 sunerhan1...@sina.com
Re: Re: why sort_columns?
I have replied this question in another topic session as below : First, please check this doc: http://carbondata.apache.org/ useful-tips-on-carbondata.html, see if can help you to understand CarbonData's index usage. Like you mentioned that 1.2 will introduce sort columns feature to help users to more easily specify which columns need to build index. for example : "create table(c1...c7) tblproperties('sort_columns' = 'c7,c3')". As you know , before 1.1 CarbonData by default builds MDK index as per the order of columns in creation table, the feature will be kept in future versions, so you can still use "create table (c1,c2,...c7")" to build index. HTH. Regards Liang 2017-05-14 19:31 GMT-07:00 sunerhan1...@sina.com <sunerhan1...@sina.com>: > hi community, > > since we already have many rmdb sqls scripts,we don't want to change them > too much when migrating to carbon. > suppose we already have sql script like: >"create table (c1,c2,...c7")", > If we want to change column order to shift most often used column ahead > when query data,i think it's better change it this way like: > "create table(c1,c2,c7,c4..c6,c3) > rather > "create table(c1...c7) tblproperties('sort_columns' = 'c7,c3')" > because the former way just reordering columns without adding extra > settings,which is more readable to thos who is familiar with rmdb but not > with carbon. > > > > sunerhan1...@sina.com > -- Regards Liang
why sort_columns?
hi community, since we already have many rmdb sqls scripts,we don't want to change them too much when migrating to carbon. suppose we already have sql script like: "create table (c1,c2,...c7")", If we want to change column order to shift most often used column ahead when query data,i think it's better change it this way like: "create table(c1,c2,c7,c4..c6,c3) rather "create table(c1...c7) tblproperties('sort_columns' = 'c7,c3')" because the former way just reordering columns without adding extra settings,which is more readable to thos who is familiar with rmdb but not with carbon. sunerhan1...@sina.com