from:"geda"

Re: carbontable compact throw err

2017-01-03 Thread geda

hello,1,2 is ok ,3 throw error



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbontable-compact-throw-err-tp5382p5430.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

carbontable compact throw err

2017-01-03 Thread geda

Hello:
in spark shell ,from carbonconext 
cc.sql("ALTER TABLE test COMPACT 'MINOR'")
error happend.
how to sloved it ?


7/01/03 18:52:32 INFO CarbonDataRDDFactory$: main Acquired the compaction
lock for table test
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 0
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 1
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 2
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 3
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 4
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 5
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 6
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 7
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 8
17/01/03 18:52:32 INFO CarbonDataRDDFactory$: main loads identified for
merge is 9
17/01/03 18:52:32 INFO Compactor$: pool-26-thread-1 spark.executor.instances
property is set to =20
17/01/03 18:52:32 INFO BlockBTreeBuilder: pool-26-thread-1
Total Number Rows In BTREE: 1
17/01/03 18:52:32 INFO BlockBTreeBuilder: pool-26-thread-1
Total Number Rows In BTREE: 1
17/01/03 18:52:32 INFO BlockBTreeBuilder: pool-26-thread-1
Total Number Rows In BTREE: 1
17/01/03 18:52:32 INFO BlockBTreeBuilder: pool-26-thread-1
Total Number Rows In BTREE: 1
17/01/03 18:52:32 ERROR CarbonDataRDDFactory$: main Exception in compaction
thread java.io.IOException: java.lang.NullPointerException
17/01/03 18:52:32 ERROR CarbonDataRDDFactory$: main Exception in compaction
thread java.io.IOException: java.lang.NullPointerException



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbontable-compact-throw-err-tp5382.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

how to make carbon run faster

2016-12-30 Thread geda

Hello:
i test the same data the same sql from two format ,1.carbondata 2,hive orc
but carbon format run  slow than orc.
i use carbondata with index order like create table order 
hivesql:(dt is partition dir )
select count(1) as total ,status,d_id from test_orc where status !=17 and
v_id  in ( 91532,91533,91534,91535,91536,91537,10001 )  and   dt >=
'2016-11-01'  and  dt <= '2016-12-26' group by status,d_id order by total
desc
carbonsql:(create_time is timestamp type )

select count(1) as total ,status,d_id from test_carbon where status !=17 and
v_id  in ( 91532,91533,91534,91535,91536,91537,10001 )  and 
date(a.create_time)>= '2016-11-01' and  date(a.create_time)<= '2016-12-26'
group by status,d_id order by total desc

create carbondata like 
CREATE TABLE test_carbon ( status int, v_id bigint, d_id bigint, create_time
timestamp
...
...
'DICTIONARY_INCLUDE'='status,d_id,v_id,create_time')

run with spark-shell,on 40 node ,spark1.6.1,carbon0.20,hadoop-2.6.3
like 
2month ,60days 30w row per days ,600MB csv format perday 
 $SPARK_HOME/bin/spark-shell --verbose --name "test"   --master yarn-client 
--driver-memory 10G   --executor-memory 16G --num-executors 40
--executor-cores 1 
 i test many case 
 1.
 gc tune ,no full gc
 2. spark.sql.suffle.partition 
 all task are in run in same time 
 3.carbon.conf set 
enable.blocklet.distribution=true

i use the code to test sql run time 
val start = System.nanoTime()
  body
  (System.nanoTime() - start)/1000/1000
  
body is  sqlcontext(sql).show()
i find orc return back faster then carbon,

to see in ui ,some times carbon ,orc are run more or less the same (i think
carbon use index should be faser,or scan sequnece read is faser than idex
scan),but orc is more stable
ui show spend 5s,but return time orc 8s,carbon 12s.(i don't know how to
detch how time spend )

here are some pic i run (run many times )
carbon run:

 

 

 

 


 

orc run:

 



 


so my question is :
1. why in spark-shell,sql.show(),orc sql  return faster then carbon
2. in the spark ui ,carbon should use index to skip more data,scan data some
time use 4s, 2s, 0.2s ,how to make the slowest task faster? 
3. like the sql ,i use the  leftest index scan,so i think is should be run
faster than orc test in this case ,but not ,why?
4.if the 3 question is ,exlain this ,my data is two small,so serial read is
faster than index scan ?

sorry for my poor english ,help,thanks!








--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/how-to-make-carbon-run-faster-tp5305.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: carbondata test join question

2016-12-14 Thread geda

hi,thanks
like table a has 200+filelds , this sql use id ,v_id are is positin
 3,4,should i put it in the first or second
id  or v_id cardinality id 3w~7w ,

2016-12-15 9:29 GMT+08:00 杰 [via Apache CarbonData Mailing List archive] <
ml-node+s1130556n4442...@n5.nabble.com>:

> hi, geda
>  can u share ur create ddl?
>  some suggestion: for that filter field (like id), u can try to put in
> left column and use dictionary_include or exclude to make it dimension. if
>  cardinality more than 100 thousand, u can try to make it no dictionary.
>  as for ur question, if all the dimensions make no dictionary, there will
> be no decode part.
>
>
> thanks
> Jay
>
>
>
>
>
> -- Original --
> From:  "geda";<[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4442&i=0>>;
> Date:  Wed, Dec 14, 2016 11:10 PM
> To:  "dev"<[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4442&i=1>>;
>
> Subject:  carbondata test join  question
>
>
>
> hello
> i  want to test orc ,carbon , which is faster
> test on yarn with 8 executors, each executor 4G,each 2 core
> spark1.6 ,carbon2.0
> sql : join  3 table
> A :300W row,5GB
> B:13W row,30MB
> C:7W row,10MB
> like this:
> select b.id ,b.d_name ,a.v_no,count(*) o_count  froma left join  b on
> a.d_id=b.id left joinc on  c.v_no=a.v_no  where
>  date(a.create_time)>=
> '2016-07-01' and  date(a.create_time)<= '2016-09-02' group by  b.id ,
> b.d_name, a.v_no  having  o_count> 30 order by b.id desc
> use context
> cc.sql($SQL).show() : carbondata :run 5times  avg time :7.3s
> hiveContext.sql($SQL).show() : ORC : run 5times  avg time:5.3s
> i find from DAG ,carbon has a more job ,do carbon decode ,finish this job
> cause 2-3s spend
> if strip this job ,carbon and orc  use time more or less the same
> i want to know how to strip  the last stage or how to tune sql like this
> .Thanks
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/jobtrace.png>
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/laststage-carbon.png>
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/laststage-orc.png>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/carbondata-test-
> join-question-tp4440.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/carbondata-test-join-question-tp4440p4442.html
> To unsubscribe from carbondata test join question, click here
> <http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4440&code=YmVpZG91NzdAZ21haWwuY29tfDQ0NDB8MTU5NTU5NDExOQ==>
> .
> NAML
> <http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Re-carbondata-test-join-question-tp4447.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

carbondata test join question

2016-12-14 Thread geda

hello
i  want to test orc ,carbon , which is faster
test on yarn with 8 executors, each executor 4G,each 2 core
spark1.6 ,carbon2.0
sql : join  3 table 
A :300W row,5GB
B:13W row,30MB
C:7W row,10MB
like this:
select b.id ,b.d_name ,a.v_no,count(*) o_count  froma left join  b on
a.d_id=b.id left joinc on  c.v_no=a.v_no  where  date(a.create_time)>=
'2016-07-01' and  date(a.create_time)<= '2016-09-02' group by  b.id ,
b.d_name, a.v_no  having  o_count> 30 order by b.id desc 
use context 
cc.sql($SQL).show() : carbondata :run 5times  avg time :7.3s
hiveContext.sql($SQL).show() : ORC : run 5times  avg time:5.3s
i find from DAG ,carbon has a more job ,do carbon decode ,finish this job
cause 2-3s spend
if strip this job ,carbon and orc  use time more or less the same
i want to know how to strip  the last stage or how to tune sql like this
.Thanks

 

 

 






--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-test-join-question-tp4440.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: carbondata-0.2 load data failed in yarn molde

2016-12-09 Thread geda

Yes，it work thanks。
follow wiki quick start,use spark-default.conf , should config
 carbondata.properties
but  i  use spark-shell , not include carbon.properties in it.



2016-12-09 11:48 GMT+08:00 Liang Chen [via Apache CarbonData Mailing List
archive] :

> Hi
>
> Have you solved this issue after applying new configurations?
>
> Regards
> Liang
>
> geda wrote
> hello:
> i test  data in spark locak model ,then load data inpath to table ,works
> well.
> but when i use yarn-client modle,  with 1w rows , size :940k ,but error
> happend ,there is no lock find in  tmp dir ,i don't know how to
> debug,help.thanks.
> spark1.6 hadoop 2.7|2.6 carbondata 0.2
> local mode: run ok
> $SPARK_HOME/bin/spark-shell --master local[4]  --jars /usr/local/spark/lib/
> carbondata_2.10-0.2.0-incubating-shade-hadoop2.7.1.jar
>
>
> yarn command : run bad
>  $SPARK_HOME/bin/spark-shell --verbose  --master yarn-client
> --driver-memory 1G --driver-cores 1   --executor-memory 4G --num-executors
> 5 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-XX:NewRatio=2
> -XX:PermSize=512m -XX:MaxPermSize=512m -XX:SurvivorRatio=6  -verbose:gc
> -XX:-PrintGCDetails -XX:+PrintGCTimeStamps " --conf "spark.driver.
> extraJavaOptions=-XX:MaxPermSize=512m -XX:PermSize=512m"  --conf
> spark.yarn.driver.memoryOverhead=1024 --conf 
> spark.yarn.executor.memoryOverhead=3096
>--jars /usr/local/spark/lib/carbondata_2.10-0.2.0-
> incubating-shade-hadoop2.7.1.jar
>
> import java.io._
> import org.apache.hadoop.hive.conf.HiveConf
> import org.apache.spark.sql.CarbonContext
> val storePath = "hdfs://test:8020/usr/carbondata/store"
> val cc = new CarbonContext(sc, storePath)
> cc.setConf(HiveConf.ConfVars.HIVECHECKFILEFORMAT.varname, "false")
> cc.setConf("carbon.kettle.home","/usr/local/spark/carbondata/carbonplugins")
>
> cc.sql("CREATE TABLE `LINEORDER3` (   LO_ORDERKEY   bigint,
> LO_LINENUMBER int,   LO_CUSTKEYbigint,   LO_PARTKEY
>  bigint,   LO_SUPPKEYbigint,   LO_ORDERDATE  int,
> LO_ORDERPRIOTITY  string,   LO_SHIPPRIOTITY   int,   LO_QUANTITY   int,
>   LO_EXTENDEDPRICE  int,   LO_ORDTOTALPRICE  int,   LO_DISCOUNT   int,
>   LO_REVENUEint,   LO_SUPPLYCOST int,   LO_TAXint,
>   LO_COMMITDATE int,   LO_SHIPMODE   string ) STORED BY
> 'carbondata'")
> cc.sql(s"load data local inpath 'hdfs://test:8020/tmp/lineorder_1w.tbl'
>  into table lineorder3 options('DELIMITER'='|', 'FILEHEADER'='LO_ORDERKEY,
> LO_LINENUMBER, LO_CUSTKEY, LO_PARTKEY , LO_SUPPKEY , LO_ORDERDATE ,
> LO_ORDERPRIOTITY ,   LO_SHIPPRIOTITY , LO_QUANTITY ,LO_EXTENDEDPRICE ,
> LO_ORDTOTALPRICE ,LO_DISCOUNT , LO_REVENUE  ,   LO_SUPPLYCOST,   LO_TAX,
> LO_COMMITDATE,   LO_SHIPMODE')")
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 8, datanode03-bi-dev): java.lang.RuntimeException: Dictionary file
> lo_orderpriotity is locked for updation. Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:353)
> at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> RDD.compute(CarbonGlobalDictionaryRDD.scala:293)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1419)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1418)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at 
> org.ap

carbondata-0.2 load data failed in yarn molde

2016-12-06 Thread geda

hello:
i test  data in spark locak model ,then load data inpath to table ,works
well.
but when i use yarn-client modle,  with 1w rows ,error happend ,there is no
lock find in  tmp dir ,i don't know how to debug,help.thanks.
local mode: run ok
$SPARK_HOME/bin/spark-shell --master local[4]  --jars
/usr/local/spark/lib/carbondata_2.10-0.2.0-incubating-shade-hadoop2.7.1.jar


yarn command : run bad 
 $SPARK_HOME/bin/spark-shell --verbose  --master yarn-client --driver-memory
1G --driver-cores 1   --executor-memory 4G --num-executors 5
--executor-cores 1 --conf "spark.executor.extraJavaOptions=-XX:NewRatio=2
-XX:PermSize=512m -XX:MaxPermSize=512m -XX:SurvivorRatio=6  -verbose:gc
-XX:-PrintGCDetails -XX:+PrintGCTimeStamps " --conf
"spark.driver.extraJavaOptions=-XX:MaxPermSize=512m -XX:PermSize=512m" 
--conf spark.yarn.driver.memoryOverhead=1024 --conf
spark.yarn.executor.memoryOverhead=3096--jars
/usr/local/spark/lib/carbondata_2.10-0.2.0-incubating-shade-hadoop2.7.1.jar

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0
(TID 8, datanode03-bi-dev): java.lang.RuntimeException: Dictionary file
lo_orderpriotity is locked for updation. Please try after some time
at scala.sys.package$.error(package.scala:27)
at
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:353)
at
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:293)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at
org.apache.carbondata.spark.util.GlobalDictionaryUtil$.generateGlobalDictionary(GlobalDictionaryUtil.scala:800)
at
org.apache.spark.sql.execution.command.LoadTableUsingKettle.run(carbonTableSchema.scala:1197)
at
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1036)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)

Re: carbondata load failed then select error

2016-11-30 Thread geda

i back to use release version 0.2, work well. 
thanks



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-load-failed-then-select-error-tp3436p3465.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: carbondata load failed then select error

2016-11-30 Thread geda

more detail
http://pastebin.com/Myp6aubs



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-load-failed-then-select-error-tp3436p3439.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

carbondata load failed then select error

2016-11-30 Thread geda

HI：
i clone from git ,branch master , compile mvn hadoop 2.6.3 ,spark 1.6.1
follow the quick start , then run spark-shell
 $SPARK_HOME/bin/spark-shell --verbose  --master local[4]  --jars
/usr/local/spark/lib/carbondata_2.10-0.3.0-incubating-SNAPSHOT-shade-hadoop2.6.3.jar,/usr/local/spark/lib/mysql-connector-java-5.1.38-bin.jar
then :paste

import java.io._

import org.apache.hadoop.hive.conf.HiveConf

import org.apache.spark.sql.CarbonContext

val storePath = "hdfs://test.namenode02.bi.com:8020/usr/carbondata/store" 

val cc = new CarbonContext(sc, storePath)

cc.setConf(HiveConf.ConfVars.HIVECHECKFILEFORMAT.varname, "false")

cc.setConf("carbon.kettle.home","/usr/local/spark/carbondata/carbonplugins")

cc.sql("create table if not exists test_table (id string, name string, city
string, age Int) STORED BY 'carbondata'")



cc.sql(s"load data inpath
'hdfs://test.namenode02.bi.com:8020/tmp/sample.csv'  into table test_table")

cc.sql("select * from test_table").show



1：

can't load ,but can create table 

Table MetaData Unlocked Successfully after data load 

> java.lang.RuntimeException: Table is locked for updation. Please try 

> after some time   

like this
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-fail-td100.html#a164

i chmod 777, then can run it 

but when i  run this ,cc.sql("select * from test_table").show

NFO  30-11 18:24:01,072 - Parse Completed

INFO  30-11 18:24:01,196 - main Starting to optimize plan

INFO  30-11 18:24:01,347 - main Total Number Rows In
BTREE: 1

INFO  30-11 18:24:01,361 - main Total Number Rows In
BTREE: 1

INFO  30-11 18:24:01,369 - main Total Number Rows In
BTREE: 1

INFO  30-11 18:24:01,376 - main Total Number Rows In
BTREE: 1

INFO  30-11 18:24:01,385 - main Total Number Rows In
BTREE: 1

INFO  30-11 18:24:01,386 - main Total Time taken to ensure the required
executors: 0

INFO  30-11 18:24:01,386 - main Time elapsed to allocate the required
executors: 0

INFO  30-11 18:24:01,391 - 

 Identified no.of.blocks: 5,

 no.of.tasks: 4,

 no.of.nodes: 1,

 parallelism: 4

   

INFO  30-11 18:24:01,396 - Starting job: show at :37

INFO  30-11 18:24:01,396 - Got job 3 (show at :37) with 1 output
partitions

INFO  30-11 18:24:01,396 - Final stage: ResultStage 4 (show at :37)

INFO  30-11 18:24:01,396 - Parents of final stage: List()

INFO  30-11 18:24:01,397 - Missing parents: List()

INFO  30-11 18:24:01,397 - Submitting ResultStage 4 (MapPartitionsRDD[20] at
show at :37), which has no missing parents

INFO  30-11 18:24:01,401 - Block broadcast_6 stored as values in memory
(estimated size 13.3 KB, free 285.6 KB)

INFO  30-11 18:24:01,403 - Block broadcast_6_piece0 stored as bytes in
memory (estimated size 6.7 KB, free 292.2 KB)

INFO  30-11 18:24:01,403 - Added broadcast_6_piece0 in memory on
localhost:15792 (size: 6.7 KB, free: 511.1 MB)

INFO  30-11 18:24:01,404 - Created broadcast 6 from broadcast at
DAGScheduler.scala:1006

INFO  30-11 18:24:01,404 - Submitting 1 missing tasks from ResultStage 4
(MapPartitionsRDD[20] at show at :37)

INFO  30-11 18:24:01,404 - Adding task set 4.0 with 1 tasks

INFO  30-11 18:24:01,405 - Starting task 0.0 in stage 4.0 (TID 6, localhost,
partition 0,PROCESS_LOCAL, 2709 bytes)

INFO  30-11 18:24:01,406 - Running task 0.0 in stage 4.0 (TID 6)

INFO  30-11 18:24:01,436 - [Executor task launch
worker-1][partitionID:table;queryID:10219962900397098] Query will be
executed on table: test_table

ERROR 30-11 18:24:01,444 - Exception in task 0.0 in stage 4.0 (TID 6)

java.lang.InterruptedException: 

at
org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:83)

at
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:171)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spar

Re: carbontable compact throw err

carbontable compact throw err

how to make carbon run faster

Re: carbondata test join question

carbondata test join question

Re: carbondata-0.2 load data failed in yarn molde

carbondata-0.2 load data failed in yarn molde

Re: carbondata load failed then select error

Re: carbondata load failed then select error

carbondata load failed then select error

10 matches

Site Navigation

Mail list logo

Footer information