Re: ClassNotFound error when insert carbontable from hive table

2017-08-22 Thread Lu Cao
Just a configuration error.
The carbondata jar was in wrong place with which configured in
spark-defaults.conf, after corrected the path it was fixed.


thanks,
Lionel

On Tue, Aug 22, 2017 at 3:17 PM, Liang Chen  wrote:

> Hi lionel
>
> Can you share with us how did you fix this issue?
>
> Regards
> Liang
>
>
> lionel061201 wrote
> > This issue had been fixed.
> >
> > On Mon, Aug 21, 2017 at 4:04 PM, Lu Cao <
>
> > whucaolu@
>
> > > wrote:
> >
> >> Hi dev,
> >>
> >> I'm trying to insert data from a hive table to carbon table:
> >>
> >> cc.sql("insert into carbon_test select * from target_table where pt =
> >> '20170101'")
> >>
> >>
> >> Any one knows how to fix this error?
> >>
> >> [Stage 8:>(0 +
> 4)
> >> / 156]17/08/21 15:59:01 WARN scheduler.TaskSetManager: Lost task 1.0 in
> >> stage 8.0 (TID 48, , executor 16): java.lang.ClassNotFoundException:
> >> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD
> >>
> >> at org.apache.spark.repl.ExecutorClassLoader.findClass(
> >> ExecutorClassLoader.scala:82)
> >>
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >>
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >>
> >> at java.lang.Class.forName0(Native Method)
> >>
> >> at java.lang.Class.forName(Class.java:270)
> >>
> >> at org.apache.spark.serializer.JavaDeserializationStream$$
> >> anon$1.resolveClass(JavaSerializer.scala:67)
> >>
> >> at
> >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> >>
> >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> >>
> >> at java.io.ObjectInputStream.readOrdinaryObject(
> >> ObjectInputStream.java:1771)
> >>
> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>
> >> at java.io.ObjectInputStream.defaultReadFields(
> >> ObjectInputStream.java:1990)
> >>
> >> at java.io.ObjectInputStream.readSerialData(
> ObjectInputStream.java:1915)
> >>
> >> at java.io.ObjectInputStream.readOrdinaryObject(
> >> ObjectInputStream.java:1798)
> >>
> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >>
> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> >>
> >> at org.apache.spark.serializer.JavaDeserializationStream.
> >> readObject(JavaSerializer.scala:75)
> >>
> >> at org.apache.spark.serializer.JavaSerializerInstance.
> >> deserialize(JavaSerializer.scala:114)
> >>
> >> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> >> ShuffleMapTask.scala:85)
> >>
> >> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> >> ShuffleMapTask.scala:53)
> >>
> >> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> >>
> >> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:322)
> >>
> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> ThreadPoolExecutor.java:1145)
> >>
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> ThreadPoolExecutor.java:615)
> >>
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >> Caused by: java.lang.ClassNotFoundException:
> org.apache.carbondata.spark.
> >> rdd.CarbonBlockDistinctValuesCombineRDD
> >>
> >> at java.lang.ClassLoader.findClass(ClassLoader.java:531)
> >>
> >> at org.apache.spark.util.ParentClassLoader.findClass(
> >> ParentClassLoader.scala:26)
> >>
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >>
> >> at org.apache.spark.util.ParentClassLoader.loadClass(
> >> ParentClassLoader.scala:34)
> >>
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >>
> >> at org.apache.spark.util.ParentClassLoader.loadClass(
> >> ParentClassLoader.scala:30)
> >>
> >> at org.apache.spark.repl.ExecutorClassLoader.findClass(
> >> ExecutorClassLoader.scala:77)
> >>
> >> ... 23 more
> >>
> >>
> >> [Stage 8:>(0 +
> 4)
> >> / 156]17/08/21 15:59:02 ERROR scheduler.TaskSetManager: Task 1 in stage
> >> 8.0
> >> failed 4 times; aborting job
> >>
> >> 17/08/21 15:59:02 ERROR util.GlobalDictionaryUtil$: main generate global
> >> dictionary failed
> >>
> >> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 1
> >> in stage 8.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> >> 8.0 (TID 61, scsp00382.saicdt.com, executor 16):
> >> java.lang.ClassNotFoundException:
> >> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD
> >>
> >> at org.apache.spark.repl.ExecutorClassLoader.findClass(
> >> ExecutorClassLoader.scala:82)
> >>
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >>
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >>
> >> at java.lang.Class.forName0(Native Method)
> >>
> >> at java.lang.Class.forName(Class.java:270)
> >>
> >> at org.apache.spark.serializer.JavaDeserializationStream$$
> >> anon$1.resolveClass(JavaSerializer.scala:67)
> >>
> >> at
> >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> >>
> >> at java.io.Obje

Compatibility among carbon[1.1.1][1.1.0]+spark[1.6][2.1.0]

2017-08-22 Thread sunerhan1...@sina.com
Hiļ¼Œall
Recently i want to test carbon1.1.1+spark2.1.0 performance and test 
environment already have carbon1.1.0+spark1.6 and carbon1.1.1+spark1.6 and 
carbon tables generated.
I tried reading carbon tables generated from carbon1.1.0+spark1.6 and 
carbon1.1.1+spark1.6as well as delete in subquery data generated from 
carbon1.1.1+spark1.6   using carbon1.1.1+spark2.1.0 roughly and all worked ok.
My question is do they have compatibility problems and may impare 
performance?Or should I regenerate carbon table using carbon1.1.1+spark2.1.0 ?
And I break my question into 3 parts:
1.Will IUD and other opeartion(like sub-query) works fine if use 
carbon1.1.1+spark2.1.0 to operate carbon1.1.0+spark1.6?
2.Will use carbon1.1.1+spark2.1.0 to operate carbon1.1.0+spark1.6 
impare performance or should regenerate data using  carbon1.1.1+spark2.1.0 
while tesing carbon1.1.1+spark2.1.0?
3.Will use carbon1.1.1+spark2.1.0 to operate carbon1.1.1+spark1.6 
impare performance or should regenerate data using  carbon1.1.1+spark2.1.0 
while tesing carbon1.1.1+spark2.1.0?
   





sunerhan1...@sina.com


Re: ClassNotFound error when insert carbontable from hive table

2017-08-22 Thread Liang Chen
Hi lionel

Can you share with us how did you fix this issue?

Regards
Liang


lionel061201 wrote
> This issue had been fixed.
> 
> On Mon, Aug 21, 2017 at 4:04 PM, Lu Cao <

> whucaolu@

> > wrote:
> 
>> Hi dev,
>>
>> I'm trying to insert data from a hive table to carbon table:
>>
>> cc.sql("insert into carbon_test select * from target_table where pt =
>> '20170101'")
>>
>>
>> Any one knows how to fix this error?
>>
>> [Stage 8:>(0 + 4)
>> / 156]17/08/21 15:59:01 WARN scheduler.TaskSetManager: Lost task 1.0 in
>> stage 8.0 (TID 48, , executor 16): java.lang.ClassNotFoundException:
>> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD
>>
>> at org.apache.spark.repl.ExecutorClassLoader.findClass(
>> ExecutorClassLoader.scala:82)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>
>> at java.lang.Class.forName0(Native Method)
>>
>> at java.lang.Class.forName(Class.java:270)
>>
>> at org.apache.spark.serializer.JavaDeserializationStream$$
>> anon$1.resolveClass(JavaSerializer.scala:67)
>>
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>
>> at java.io.ObjectInputStream.readOrdinaryObject(
>> ObjectInputStream.java:1771)
>>
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> at java.io.ObjectInputStream.defaultReadFields(
>> ObjectInputStream.java:1990)
>>
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> at java.io.ObjectInputStream.readOrdinaryObject(
>> ObjectInputStream.java:1798)
>>
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>
>> at org.apache.spark.serializer.JavaDeserializationStream.
>> readObject(JavaSerializer.scala:75)
>>
>> at org.apache.spark.serializer.JavaSerializerInstance.
>> deserialize(JavaSerializer.scala:114)
>>
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
>> ShuffleMapTask.scala:85)
>>
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
>> ShuffleMapTask.scala:53)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>>
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Caused by: java.lang.ClassNotFoundException: org.apache.carbondata.spark.
>> rdd.CarbonBlockDistinctValuesCombineRDD
>>
>> at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>
>> at org.apache.spark.util.ParentClassLoader.findClass(
>> ParentClassLoader.scala:26)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>
>> at org.apache.spark.util.ParentClassLoader.loadClass(
>> ParentClassLoader.scala:34)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>
>> at org.apache.spark.util.ParentClassLoader.loadClass(
>> ParentClassLoader.scala:30)
>>
>> at org.apache.spark.repl.ExecutorClassLoader.findClass(
>> ExecutorClassLoader.scala:77)
>>
>> ... 23 more
>>
>>
>> [Stage 8:>(0 + 4)
>> / 156]17/08/21 15:59:02 ERROR scheduler.TaskSetManager: Task 1 in stage
>> 8.0
>> failed 4 times; aborting job
>>
>> 17/08/21 15:59:02 ERROR util.GlobalDictionaryUtil$: main generate global
>> dictionary failed
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
>> in stage 8.0 failed 4 times, most recent failure: Lost task 1.3 in stage
>> 8.0 (TID 61, scsp00382.saicdt.com, executor 16):
>> java.lang.ClassNotFoundException:
>> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD
>>
>> at org.apache.spark.repl.ExecutorClassLoader.findClass(
>> ExecutorClassLoader.scala:82)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>
>> at java.lang.Class.forName0(Native Method)
>>
>> at java.lang.Class.forName(Class.java:270)
>>
>> at org.apache.spark.serializer.JavaDeserializationStream$$
>> anon$1.resolveClass(JavaSerializer.scala:67)
>>
>> at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>
>> at java.io.ObjectInputStream.readOrdinaryObject(
>> ObjectInputStream.java:1771)
>>
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>
>> at java.io.ObjectInputStream.defaultReadFields(
>> ObjectInputStream.java:1990)
>>
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>
>> at java.io.ObjectInputStream.readOrdinaryObject(
>> ObjectInputStream.java:1798)
>>
>> at java.io.ObjectInputStream.readObject0(Objec