Re: OOM with groupBy + saveAsTextFile
Hi, FYI as follows. Could you post your heap size settings as well your Spark app code? Regards Arthur 3.1.3 Detail Message: Requested array size exceeds VM limit The detail message Requested array size exceeds VM limit indicates that the application (or APIs used by that application) attempted to allocate an array that is larger than the heap size. For example, if an application attempts to allocate an array of 512MB but the maximum heap size is 256MB then OutOfMemoryError will be thrown with the reason Requested array size exceeds VM limit. In most cases the problem is either a configuration issue (heap size too small), or a bug that results in an application attempting to create a huge array, for example, when the number of elements in the array are computed using an algorithm that computes an incorrect size.” On 2 Nov, 2014, at 12:25 pm, Bharath Ravi Kumar reachb...@gmail.com wrote: Resurfacing the thread. Oom shouldn't be the norm for a common groupby / sort use case in a framework that is leading in sorting bench marks? Or is there something fundamentally wrong in the usage? On 02-Nov-2014 1:06 am, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi, I'm trying to run groupBy(function) followed by saveAsTextFile on an RDD of count ~ 100 million. The data size is 20GB and groupBy results in an RDD of 1061 keys with values being IterableTuple4String, Integer, Double, String. The job runs on 3 hosts in a standalone setup with each host's executor having 100G RAM and 24 cores dedicated to it. While the groupBy stage completes successfully with ~24GB of shuffle write, the saveAsTextFile fails after repeated retries with each attempt failing due to an out of memory error [1]. I understand that a few partitions may be overloaded as a result of the groupBy and I've tried the following config combinations unsuccessfully: 1) Repartition the initial rdd (44 input partitions but 1061 keys) across 1061 paritions and have max cores = 3 so that each key is a logical partition (though many partitions will end up on very few hosts), and each host likely runs saveAsTextFile on a single key at a time due to max cores = 3 with 3 hosts in the cluster. The level of parallelism is unspecified. 2) Leave max cores unspecified, set the level of parallelism to 72, and leave number of partitions unspecified (in which case the # input partitions was used, which is 44) Since I do not intend to cache RDD's, I have set spark.storage.memoryFraction=0.2 in both cases. My understanding is that if each host is processing a single logical partition to saveAsTextFile and is reading from other hosts to write out the RDD, it is unlikely that it would run out of memory. My interpretation of the spark tuning guide is that the degree of parallelism has little impact in case (1) above since max cores = number of hosts. Can someone explain why there are still OOM's with 100G being available? On a related note, intuitively (though I haven't read the source), it appears that an entire key-value pair needn't fit into memory of a single host for saveAsTextFile since a single shuffle read from a remote can be written to HDFS before the next remote read is carried out. This way, not all data needs to be collected at the same time. Lastly, if an OOM is (but shouldn't be) a common occurrence (as per the tuning guide and even as per Datastax's spark introduction), there may need to be more documentation around the internals of spark to help users take better informed tuning decisions with parallelism, max cores, number partitions and other tunables. Is there any ongoing effort on that front? Thanks, Bharath [1] OOM stack trace and logs 14/11/01 12:26:37 WARN TaskSetManager: Lost task 61.0 in stage 36.0 (TID 1264, proc1.foo.bar.com): java.lang.OutOfMemoryError: Requested array size exceeds VM limit java.util.Arrays.copyOf(Arrays.java:3326) java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) java.lang.StringBuilder.append(StringBuilder.java:136) scala.collection.mutable.StringBuilder.append(StringBuilder.scala:197) scala.Tuple2.toString(Tuple2.scala:22) org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1158) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
Spark 1.1.0 on Hive 0.13.1
Hi, My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please advise. Or, any news about when will Spark 1.1.0 on Hive 0.1.3.1 be available? Regards Arthur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark 1.1.0 on Hive 0.13.1
Hi, Thanks for your update. Any idea when will Spark 1.2 be GA? Regards Arthur On 29 Oct, 2014, at 8:22 pm, Cheng Lian lian.cs@gmail.com wrote: Spark 1.1.0 doesn't support Hive 0.13.1. We plan to support it in 1.2.0, and related PRs are already merged or being merged to the master branch. On 10/29/14 7:43 PM, arthur.hk.c...@gmail.com wrote: Hi, My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please advise. Or, any news about when will Spark 1.1.0 on Hive 0.1.3.1 be available? Regards Arthur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark/HIVE Insert Into values Error
Hi, I have already found the way about how to “insert into HIVE_TABLE values (…..) Regards Arthur On 18 Oct, 2014, at 10:09 pm, Cheng Lian lian.cs@gmail.com wrote: Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT INTO ... VALUES ... syntax. On 10/18/14 1:33 AM, arthur.hk.c...@gmail.com wrote: Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur hive CREATE TABLE students (name VARCHAR(64), age INT, gpa int); OK hive INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1); NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:693) at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:31374) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:29083) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28968) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28762) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1238) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:27 cannot recognize input near 'VALUES' '(' ''fred flintstone'' in select clause
Re: Spark 1.1.0 and Hive 0.12.0 Compatibility Issue
) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)14/10/25 06:50:15 WARN TaskSetManager: Lost task 7.3 in stage 5.0 (TID 575, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 WARN TaskSetManager: Lost task 24.2 in stage 5.0 (TID 560, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 WARN TaskSetManager: Lost task 22.2 in stage 5.0 (TID 561, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 WARN TaskSetManager: Lost task 20.2 in stage 5.0 (TID 564, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 WARN TaskSetManager: Lost task 13.2 in stage 5.0 (TID 562, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 WARN TaskSetManager: Lost task 27.2 in stage 5.0 (TID 565, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 WARN TaskSetManager: Lost task 34.2 in stage 5.0 (TID 568, m137.emblocsoft.net): TaskKilled (killed intentionally) 14/10/25 06:50:15 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool Regards Arthur On 24 Oct, 2014, at 6:56 am, Michael Armbrust mich...@databricks.com wrote: Can you show the DDL for the table? It looks like the SerDe might be saying it will produce a decimal type but is actually producing a string. On Thu, Oct 23, 2014 at 3:17 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi My Spark is 1.1.0 and Hive is 0.12, I tried to run the same query in both Hive-0.12.0 then Spark-1.1.0, HiveQL works while SparkSQL failed. hive select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer c join orders o on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey where o_orderdate '1995-03-15' and l_shipdate '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10; Ended Job = job_1414067367860_0011 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.0 sec HDFS Read: 261 HDFS Write: 96 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 0.88 sec HDFS Read: 458 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 880 msec OK Time taken: 38.771 seconds scala sqlContext.sql(select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer c join orders o on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey where o_orderdate '1995-03-15' and l_shipdate '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10).collect().foreach
Re: Spark: Order by Failed, java.lang.NullPointerException
Hi, Added “l_linestatus” it works, THANK YOU!! sqlContext.sql(select l_linestatus, l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem order by L_LINESTATUS limit 10).collect().foreach(println); 14/10/25 07:03:24 INFO DAGScheduler: Stage 12 (takeOrdered at basicOperators.scala:171) finished in 54.358 s 14/10/25 07:03:24 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool 14/10/25 07:03:24 INFO SparkContext: Job finished: takeOrdered at basicOperators.scala:171, took 54.374629175 s [F,71769288,2,-5859884,13,1993-12-13,R,F] [F,71769319,4,-4098165,19,1992-10-12,R,F] [F,71769288,3,2903707,44,1994-10-08,R,F] [F,71769285,2,-741439,42,1994-04-22,R,F] [F,71769313,5,-1276467,12,1992-08-15,R,F] [F,71769314,7,-5595080,13,1992-03-28,A,F] [F,71769316,1,-1766622,16,1993-12-05,R,F] [F,71769287,2,-767340,50,1993-06-21,A,F] [F,71769317,2,665847,15,1992-05-03,A,F] [F,71769286,1,-5667701,15,1994-04-17,A,F] Regards Arthur On 24 Oct, 2014, at 2:58 pm, Akhil Das ak...@sigmoidanalytics.com wrote: Not sure if this would help, but make sure you are having the column l_linestatus in the data. Thanks Best Regards On Thu, Oct 23, 2014 at 5:59 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I got java.lang.NullPointerException. Please help! sqlContext.sql(select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem limit 10).collect().foreach(println); 2014-10-23 08:20:12,024 INFO [sparkDriver-akka.actor.default-dispatcher-31] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Stage 41 (runJob at basicOperators.scala:136) finished in 0.086 s 2014-10-23 08:20:12,024 INFO [Result resolver thread-1] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Removed TaskSet 41.0, whose tasks have all completed, from pool 2014-10-23 08:20:12,024 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Job finished: runJob at basicOperators.scala:136, took 0.090129332 s [9001,6,-4584121,17,1997-01-04,N,O] [9002,1,-2818574,23,1996-02-16,N,O] [9002,2,-2449102,21,1993-12-12,A,F] [9002,3,-5810699,26,1994-04-06,A,F] [9002,4,-489283,18,1994-11-11,R,F] [9002,5,2169683,15,1997-09-14,N,O] [9002,6,2405081,4,1992-08-03,R,F] [9002,7,3835341,40,1998-04-28,N,O] [9003,1,1900071,4,1994-05-05,R,F] [9004,1,-2614665,41,1993-06-13,A,F] If order by L_LINESTATUS” is added then error: sqlContext.sql(select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem order by L_LINESTATUS limit 10).collect().foreach(println); 2014-10-23 08:22:08,524 INFO [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem order by L_LINESTATUS limit 10 2014-10-23 08:22:08,525 INFO [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed 2014-10-23 08:22:08,526 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=boc_12 tbl=lineitem 2014-10-23 08:22:08,526 INFO [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=hd ip=unknown-ip-addr cmd=get_table : db=boc_12 tbl=lineitem java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1262) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1269) at org.apache.spark.sql.hive.HadoopTableReader.init(TableReader.scala:63) at org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:68) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$TakeOrdered$.apply(SparkStrategies.scala:191) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58
Re: Spark Hive Snappy Error
HI Removed export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar” It works, THANK YOU!! Regards Arthur On 23 Oct, 2014, at 1:00 pm, Shao, Saisai saisai.s...@intel.com wrote: Seems you just add snappy library into your classpath: export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar But for spark itself, it depends on snappy-0.2.jar. Is there any possibility that this problem caused by different version of snappy? Thanks Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Thursday, October 23, 2014 11:32 AM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Please find the attached file. my spark-default.xml # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # # Example: # spark.masterspark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializerorg.apache.spark.serializer.KryoSerializer # spark.executor.memory 2048m spark.shuffle.spill.compressfalse spark.io.compression.codec org.apache.spark.io.SnappyCompressionCodec my spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_WORKER_DIR=/edh/hadoop_data/spark_work/ export SPARK_LOG_DIR=/edh/hadoop_logs/spark export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar: export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 export SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181 export SPARK_JAVA_OPTS= -XX:+UseConcMarkSweepGC ll $HADOOP_HOME/lib/native/Linux-amd64-64 -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar -rw-rw-r--. 1 tester tester 1062640 Aug 27 12:19 libhadoop.a -rw-rw-r--. 1 tester tester 1487564 Aug 27 11:14 libhadooppipes.a lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so - libhadoopsnappy.so.0.0.1 lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so.0 - libhadoopsnappy.so.0.0.1 -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1 -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so.1.0.0 -rw-rw-r--. 1 tester tester 582472 Aug 27 11:14 libhadooputils.a -rw-rw-r--. 1 tester tester 298626 Aug 27 11:14 libhdfs.a -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so.0.0.0 lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so -/usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so - libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so.8 - libprotobuf-lite.so.8.0.0 -rwxr-xr-x. 1 tester tester 964689 Aug 27 07:08 libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so - libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so.8 - libprotobuf.so.8.0.0 -rwxr-xr-x. 1 tester tester 8300050 Aug 27 07:08 libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so - libprotoc.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so.8 - libprotoc.so.8.0.0 -rwxr-xr-x. 1 tester tester 9935810 Aug 27 07:08 libprotoc.so.8.0.0 -rw-r--r--. 1 tester tester 233554 Aug 27 15:19 libsnappy.a lrwxrwxrwx. 1 tester tester 23 Aug 27 11:32 libsnappy.so - /usr/lib64/libsnappy.so lrwxrwxrwx. 1 tester tester 23 Aug 27 11:33 libsnappy.so.1 - /usr/lib64/libsnappy.so -rwxr-xr-x. 1 tester tester 147726 Aug 27 07:08 libsnappy.so.1.2.0 drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig Regards Arthur On 23 Oct, 2014, at 10:57 am, Shao, Saisai saisai.s...@intel.com wrote: Hi Arthur, I think your problem might be different from what SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, seems your problem is more likely to be a library link problem, would you mind checking your Spark runtime to see if the snappy.so is loaded or not? (through lsof -p). I guess your problem is more likely to be a library not found problem. Thanks
Aggregation Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Hi, I got $TreeNodeException, few questions: Q1) How should I do aggregation in SparK? Can I use aggregation directly in SQL? or Q1) Should I use SQL to load the data to form RDD then use scala to do the aggregation? Regards Arthur MySQL (good one, without aggregation): sqlContext.sql(SELECT L_RETURNFLAG FROM LINEITEM WHERE L_SHIPDATE='1998-09-02' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS).collect().foreach(println); [A] [N] [N] [R] My SQL (problem SQL, with aggregation): sqlContext.sql(SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT )) AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT ) * ( 1 + L_TAX )) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM WHERE L_SHIPDATE='1998-09-02' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS).collect().foreach(println); 14/10/23 20:38:31 INFO TaskSchedulerImpl: Removed TaskSet 18.0, whose tasks have all completed, from pool org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [l_returnflag#200 ASC,l_linestatus#201 ASC], true Exchange (RangePartitioning [l_returnflag#200 ASC,l_linestatus#201 ASC], 200) Aggregate false, [l_returnflag#200,l_linestatus#201], [l_returnflag#200,l_linestatus#201,SUM(PartialSum#216) AS sum_qty#181,SUM(PartialSum#217) AS sum_base_price#182,SUM(PartialSum#218) AS sum_disc_price#183,SUM(PartialSum#219) AS sum_charge#184,(CAST(SUM(PartialSum#220), DoubleType) / CAST(SUM(PartialCount#221L), DoubleType)) AS avg_qty#185,(CAST(SUM(PartialSum#222), DoubleType) / CAST(SUM(PartialCount#223L), DoubleType)) AS avg_price#186,(CAST(SUM(PartialSum#224), DoubleType) / CAST(SUM(PartialCount#225L), DoubleType)) AS avg_disc#187,SUM(PartialCount#226L) AS count_order#188L] Exchange (HashPartitioning [l_returnflag#200,l_linestatus#201], 200) Aggregate true, [l_returnflag#200,l_linestatus#201], [l_returnflag#200,l_linestatus#201,COUNT(l_discount#195) AS PartialCount#225L,SUM(l_discount#195) AS PartialSum#224,COUNT(1) AS PartialCount#226L,SUM((l_extendedprice#194 * (1.0 - l_discount#195))) AS PartialSum#218,SUM(l_extendedprice#194) AS PartialSum#217,COUNT(l_quantity#193) AS PartialCount#221L,SUM(l_quantity#193) AS PartialSum#220,COUNT(l_extendedprice#194) AS PartialCount#223L,SUM(l_extendedprice#194) AS PartialSum#222,SUM(((l_extendedprice#194 * (1.0 - l_discount#195)) * (1.0 + l_tax#196))) AS PartialSum#219,SUM(l_quantity#193) AS PartialSum#216] Project [l_extendedprice#194,l_linestatus#201,l_discount#195,l_returnflag#200,l_tax#196,l_quantity#193] Filter (l_shipdate#197 = 1998-09-02) HiveTableScan [l_shipdate#197,l_extendedprice#194,l_linestatus#201,l_discount#195,l_returnflag#200,l_tax#196,l_quantity#193], (MetastoreRelation boc_12, lineitem, None), None at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:191) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954) at
Re: Aggregation Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
HI, My step to create LINEITEM: $HADOOP_HOME/bin/hadoop fs -mkdir /tpch/lineitem $HADOOP_HOME/bin/hadoop fs -copyFromLocal lineitem.tbl /tpch/lineitem/ Create external table lineitem (L_ORDERKEY INT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE, L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING, L_SHIPDATE STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING, L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION '/tpch/lineitem’; Regards Arthur On 23 Oct, 2014, at 9:36 pm, Yin Huai huaiyin@gmail.com wrote: Hello Arthur, You can use do aggregations in SQL. How did you create LINEITEM? Thanks, Yin On Thu, Oct 23, 2014 at 8:54 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I got $TreeNodeException, few questions: Q1) How should I do aggregation in SparK? Can I use aggregation directly in SQL? or Q1) Should I use SQL to load the data to form RDD then use scala to do the aggregation? Regards Arthur MySQL (good one, without aggregation): sqlContext.sql(SELECT L_RETURNFLAG FROM LINEITEM WHERE L_SHIPDATE='1998-09-02' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS).collect().foreach(println); [A] [N] [N] [R] My SQL (problem SQL, with aggregation): sqlContext.sql(SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT )) AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT ) * ( 1 + L_TAX )) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM WHERE L_SHIPDATE='1998-09-02' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS).collect().foreach(println); 14/10/23 20:38:31 INFO TaskSchedulerImpl: Removed TaskSet 18.0, whose tasks have all completed, from pool org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [l_returnflag#200 ASC,l_linestatus#201 ASC], true Exchange (RangePartitioning [l_returnflag#200 ASC,l_linestatus#201 ASC], 200) Aggregate false, [l_returnflag#200,l_linestatus#201], [l_returnflag#200,l_linestatus#201,SUM(PartialSum#216) AS sum_qty#181,SUM(PartialSum#217) AS sum_base_price#182,SUM(PartialSum#218) AS sum_disc_price#183,SUM(PartialSum#219) AS sum_charge#184,(CAST(SUM(PartialSum#220), DoubleType) / CAST(SUM(PartialCount#221L), DoubleType)) AS avg_qty#185,(CAST(SUM(PartialSum#222), DoubleType) / CAST(SUM(PartialCount#223L), DoubleType)) AS avg_price#186,(CAST(SUM(PartialSum#224), DoubleType) / CAST(SUM(PartialCount#225L), DoubleType)) AS avg_disc#187,SUM(PartialCount#226L) AS count_order#188L] Exchange (HashPartitioning [l_returnflag#200,l_linestatus#201], 200) Aggregate true, [l_returnflag#200,l_linestatus#201], [l_returnflag#200,l_linestatus#201,COUNT(l_discount#195) AS PartialCount#225L,SUM(l_discount#195) AS PartialSum#224,COUNT(1) AS PartialCount#226L,SUM((l_extendedprice#194 * (1.0 - l_discount#195))) AS PartialSum#218,SUM(l_extendedprice#194) AS PartialSum#217,COUNT(l_quantity#193) AS PartialCount#221L,SUM(l_quantity#193) AS PartialSum#220,COUNT(l_extendedprice#194) AS PartialCount#223L,SUM(l_extendedprice#194) AS PartialSum#222,SUM(((l_extendedprice#194 * (1.0 - l_discount#195)) * (1.0 + l_tax#196))) AS PartialSum#219,SUM(l_quantity#193) AS PartialSum#216] Project [l_extendedprice#194,l_linestatus#201,l_discount#195,l_returnflag#200,l_tax#196,l_quantity#193] Filter (l_shipdate#197 = 1998-09-02) HiveTableScan [l_shipdate#197,l_extendedprice#194,l_linestatus#201,l_discount#195,l_returnflag#200,l_tax#196,l_quantity#193], (MetastoreRelation boc_12, lineitem, None), None at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:191) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789
Spark 1.1.0 and Hive 0.12.0 Compatibility Issue
(Please ignore if duplicated) Hi, My Spark is 1.1.0 and Hive is 0.12, I tried to run the same query in both Hive-0.12.0 then Spark-1.1.0, HiveQL works while SparkSQL failed. hive select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer c join orders o on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey where o_orderdate '1995-03-15' and l_shipdate '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10; Ended Job = job_1414067367860_0011 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.0 sec HDFS Read: 261 HDFS Write: 96 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 0.88 sec HDFS Read: 458 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 880 msec OK Time taken: 38.771 seconds scala sqlContext.sql(select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer c join orders o on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey where o_orderdate '1995-03-15' and l_shipdate '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10).collect().foreach(println); org.apache.spark.SparkException: Job aborted due to stage failure: Task 14 in stage 5.0 failed 4 times, most recent failure: Lost task 14.3 in stage 5.0 (TID 568, m34): java.lang.ClassCastException: java.lang.String cannot be cast to scala.math.BigDecimal scala.math.Numeric$BigDecimalIsFractional$.minus(Numeric.scala:182) org.apache.spark.sql.catalyst.expressions.Subtract$$anonfun$eval$3.apply(arithmetic.scala:64) org.apache.spark.sql.catalyst.expressions.Subtract$$anonfun$eval$3.apply(arithmetic.scala:64) org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:114) org.apache.spark.sql.catalyst.expressions.Subtract.eval(arithmetic.scala:64) org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:108) org.apache.spark.sql.catalyst.expressions.Multiply.eval(arithmetic.scala:70) org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:47) org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:108) org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58) org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:69) org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:433) org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167) org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at
Re: Spark Hive Snappy Error
(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2014-10-22 20:23:17,038 INFO [sparkDriver-akka.actor.default-dispatcher-14] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon. 2014-10-22 20:23:17,039 INFO [sparkDriver-akka.actor.default-dispatcher-14] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with flushing remote transports. Regards Arthur On 17 Oct, 2014, at 9:33 am, Shao, Saisai saisai.s...@intel.com wrote: Hi Arthur, I think this is a known issue in Spark, you can check (https://issues.apache.org/jira/browse/SPARK-3958). I’m curious about it, can you always reproduce this issue, Is this issue related to some specific data sets, would you mind giving me some information about you workload, Spark configuration, JDK version and OS version? Thanks Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Friday, October 17, 2014 7:13 AM To: user Cc: arthur.hk.c...@gmail.com Subject: Spark Hive Snappy Error Hi, When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(1) from q8_national_market_share sqlContext.sql(select count(1) from q8_national_market_share).collect().foreach(println) java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.sql.hive.HadoopTableReader.init(TableReader.scala:68) at org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:68) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30
Re: Spark Hive Snappy Error
Hi, FYI, I use snappy-java-1.0.4.1.jar Regards Arthur On 22 Oct, 2014, at 8:59 pm, Shao, Saisai saisai.s...@intel.com wrote: Thanks a lot, I will try to reproduce this in my local settings and dig into the details, thanks for your information. BR Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Wednesday, October 22, 2014 8:35 PM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Yes, I can always reproduce the issue: about you workload, Spark configuration, JDK version and OS version? I ran SparkPI 1000 java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) cat /etc/centos-release CentOS release 6.5 (Final) My Spark’s hive-site.xml with following: property namehive.exec.compress.output/name valuetrue/value /property property namemapred.output.compression.codec/name valueorg.apache.hadoop.io.compress.SnappyCodec/value /property property namemapred.output.compression.type/name valueBLOCK/value /property e.g. MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000 2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver] java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2014-10-22 20:23:17,036 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35 Exception in thread main org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431
ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId
Hi, I just tried sample PI calculation on Spark Cluster, after returning the Pi result, it shows ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m37,35662) not found ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://m33:7077 --executor-memory 512m --total-executor-cores 40 examples/target/spark-examples_2.10-1.1.0.jar 100 14/10/23 05:09:03 INFO TaskSetManager: Finished task 87.0 in stage 0.0 (TID 87) in 346 ms on m134.emblocsoft.net (99/100) 14/10/23 05:09:03 INFO TaskSetManager: Finished task 98.0 in stage 0.0 (TID 98) in 262 ms on m134.emblocsoft.net (100/100) 14/10/23 05:09:03 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/10/23 05:09:03 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 2.597 s 14/10/23 05:09:03 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 2.725328861 s Pi is roughly 3.1414948 14/10/23 05:09:03 INFO SparkUI: Stopped Spark web UI at http://m33:4040 14/10/23 05:09:03 INFO DAGScheduler: Stopping DAGScheduler 14/10/23 05:09:03 INFO SparkDeploySchedulerBackend: Shutting down all executors 14/10/23 05:09:03 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 14/10/23 05:09:04 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@37852165 14/10/23 05:09:04 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(m37,35662) 14/10/23 05:09:04 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(m37,35662) 14/10/23 05:09:04 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m37,35662) not found 14/10/23 05:09:04 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@37852165 java.nio.channels.CancelledKeyException at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) 14/10/23 05:09:04 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(m36,34230) 14/10/23 05:09:04 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(m35,50371) 14/10/23 05:09:04 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(m36,34230) 14/10/23 05:09:04 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(m34,41562) 14/10/23 05:09:04 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m36,34230) not found 14/10/23 05:09:04 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(m35,50371) 14/10/23 05:09:04 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m35,50371) not found 14/10/23 05:09:04 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(m33,39517) 14/10/23 05:09:04 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(m33,39517) 14/10/23 05:09:04 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(m34,41562) 14/10/23 05:09:04 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m33,39517) not found 14/10/23 05:09:04 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(m34,41562) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/23 05:09:04 INFO ConnectionManager: Handling connection error on connection to ConnectionManagerId(m34,41562) 14/10/23 05:09:04 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(m34,41562) 14/10/23 05:09:04 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@2e0b5c4a 14/10/23 05:09:04 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@2e0b5c4a java.nio.channels.CancelledKeyException at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) 14/10/23 05:09:04 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@653f8844 14/10/23 05:09:04 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@653f8844 java.nio.channels.CancelledKeyException at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) at
Re: ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId
Hi, I have managed to resolve it because a wrong setting. Please ignore this . Regards Arthur On 23 Oct, 2014, at 5:14 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: 14/10/23 05:09:04 WARN ConnectionManager: All connections not cleaned up
Spark: Order by Failed, java.lang.NullPointerException
Hi, I got java.lang.NullPointerException. Please help! sqlContext.sql(select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem limit 10).collect().foreach(println); 2014-10-23 08:20:12,024 INFO [sparkDriver-akka.actor.default-dispatcher-31] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Stage 41 (runJob at basicOperators.scala:136) finished in 0.086 s 2014-10-23 08:20:12,024 INFO [Result resolver thread-1] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Removed TaskSet 41.0, whose tasks have all completed, from pool 2014-10-23 08:20:12,024 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Job finished: runJob at basicOperators.scala:136, took 0.090129332 s [9001,6,-4584121,17,1997-01-04,N,O] [9002,1,-2818574,23,1996-02-16,N,O] [9002,2,-2449102,21,1993-12-12,A,F] [9002,3,-5810699,26,1994-04-06,A,F] [9002,4,-489283,18,1994-11-11,R,F] [9002,5,2169683,15,1997-09-14,N,O] [9002,6,2405081,4,1992-08-03,R,F] [9002,7,3835341,40,1998-04-28,N,O] [9003,1,1900071,4,1994-05-05,R,F] [9004,1,-2614665,41,1993-06-13,A,F] If order by L_LINESTATUS” is added then error: sqlContext.sql(select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem order by L_LINESTATUS limit 10).collect().foreach(println); 2014-10-23 08:22:08,524 INFO [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem order by L_LINESTATUS limit 10 2014-10-23 08:22:08,525 INFO [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed 2014-10-23 08:22:08,526 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=boc_12 tbl=lineitem 2014-10-23 08:22:08,526 INFO [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=hd ip=unknown-ip-addr cmd=get_table : db=boc_12 tbl=lineitem java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1262) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1269) at org.apache.spark.sql.hive.HadoopTableReader.init(TableReader.scala:63) at org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:68) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$TakeOrdered$.apply(SparkStrategies.scala:191) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
Re: Spark Hive Snappy Error
Hi,Please find the attached file.{\rtf1\ansi\ansicpg1252\cocoartf1265\cocoasubrtf210 {\fonttbl\f0\fnil\fcharset0 Menlo-Regular;} {\colortbl;\red255\green255\blue255;} \paperw11900\paperh16840\margl1440\margr1440\vieww26300\viewh12480\viewkind0 \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural \f0\fs22 \cf0 \CocoaLigature0 lsof -p 16459 (Master)\ COMMAND PIDUSER FD TYPE DEVICE SIZE/OFF NODE NAME\ java16459 tester cwdDIR 253,2 4096 6039786 /hadoop/spark-1.1.0_patched\ java16459 tester rtdDIR 253,0 40962 /\ java16459 tester txtREG 253,0 12150 2780995 /usr/lib/jvm/jdk1.7.0_67/bin/java\ java16459 tester memREG 253,0156928 2228230 /lib64/ld-2.12.so\ java16459 tester memREG 253,0 1926680 2228250 /lib64/libc-2.12.so\ java16459 tester memREG 253,0145896 2228251 /lib64/libpthread-2.12.so\ java16459 tester memREG 253,0 22536 2228254 /lib64/libdl-2.12.so\ java16459 tester memREG 253,0109006 2759278 /usr/lib/jvm/jdk1.7.0_67/lib/amd64/jli/libjli.so\ java16459 tester memREG 253,0599384 2228264 /lib64/libm-2.12.so\ java16459 tester memREG 253,0 47064 2228295 /lib64/librt-2.12.so\ java16459 tester memREG 253,0113952 2228328 /lib64/libresolv-2.12.so\ java16459 tester memREG 253,0 99158576 2388225 /usr/lib/locale/locale-archive\ java16459 tester memREG 253,0 27424 2228249 /lib64/libnss_dns-2.12.so\ java16459 tester memREG 253,2 138832345 6555616 /hadoop/spark-1.1.0_patched/assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop2.4.1.jar\ java16459 tester memREG 253,0580624 2893171 /usr/lib/jvm/jdk1.7.0_67/jre/lib/jsse.jar\ java16459 tester memREG 253,0114742 2893221 /usr/lib/jvm/jdk1.7.0_67/jre/lib/amd64/libnet.so\ java16459 tester memREG 253,0 91178 2893222 /usr/lib/jvm/jdk1.7.0_67/jre/lib/amd64/libnio.so\ java16459 tester memREG 253,2 1769726 6816963 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-rdbms-3.2.1.jar\ java16459 tester memREG 253,2337012 6816961 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-api-jdo-3.2.1.jar\ java16459 tester memREG 253,2 1801810 6816962 /hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-core-3.2.2.jar\ java16459 tester memREG 253,2 25153 7079998 /hadoop/hive-0.12.0-bin/csv-serde-1.1.2-0.11.0-all.jar\ java16459 tester memREG 253,2 21817 6032989 /hadoop/hbase-0.98.5-hadoop2/lib/gmbal-api-only-3.0.0-b023.jar\ java16459 tester memREG 253,2177131 6032940 /hadoop/hbase-0.98.5-hadoop2/lib/jetty-util-6.1.26.jar\ java16459 tester memREG 253,2 32677 6032915 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-hadoop-compat-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2143602 6032959 /hadoop/hbase-0.98.5-hadoop2/lib/commons-digester-1.8.jar\ java16459 tester memREG 253,2 97738 6032917 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-prefix-tree-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2 17884 6032949 /hadoop/hbase-0.98.5-hadoop2/lib/jackson-jaxrs-1.8.8.jar\ java16459 tester memREG 253,2253086 6032987 /hadoop/hbase-0.98.5-hadoop2/lib/grizzly-http-2.1.2.jar\ java16459 tester memREG 253,2 73778 6032916 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-hadoop2-compat-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2336904 6032993 /hadoop/hbase-0.98.5-hadoop2/lib/grizzly-http-servlet-2.1.2.jar\ java16459 tester memREG 253,2927415 6032914 /hadoop/hbase-0.98.5-hadoop2/lib/hbase-client-0.98.5-hadoop2.jar\ java16459 tester memREG 253,2125740 6033008 /hadoop/hbase-0.98.5-hadoop2/lib/hadoop-yarn-server-applicationhistoryservice-2.4.1.jar\ java16459 tester memREG 253,2 15010 6032936 /hadoop/hbase-0.98.5-hadoop2/lib/xmlenc-0.52.jar\ java16459 tester memREG 253,2 60686 6032926 /hadoop/hbase-0.98.5-hadoop2/lib/commons-logging-1.1.1.jar\ java16459 tester memREG 253,2259600 6032927 /hadoop/hbase-0.98.5-hadoop2/lib/commons-codec-1.7.jar\ java16459 tester memREG 253,2321806 6032957 /hadoop/hbase-0.98.5-hadoop2/lib/jets3t-0.6.1.jar\ java16459 tester memREG 253,2 85353 6032982 /hadoop/hbase-0.98.5-hadoop2/lib/javax.servlet-api-3.0.1.jar\ java16459 tester mem
Re: Spark Hive Snappy Error
Hi May I know where to configure Spark to load libhadoop.so? Regards Arthur On 23 Oct, 2014, at 11:31 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Please find the attached file. lsof.rtf my spark-default.xml # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # # Example: # spark.masterspark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dirhdfs://namenode:8021/directory # spark.serializerorg.apache.spark.serializer.KryoSerializer # spark.executor.memory 2048m spark.shuffle.spill.compressfalse spark.io.compression.codecorg.apache.spark.io.SnappyCompressionCodec my spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_WORKER_DIR=/edh/hadoop_data/spark_work/ export SPARK_LOG_DIR=/edh/hadoop_logs/spark export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HBASE_HOME/lib/*:$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar: export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 export SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=m35:2181,m33:2181,m37:2181 export SPARK_JAVA_OPTS= -XX:+UseConcMarkSweepGC ll $HADOOP_HOME/lib/native/Linux-amd64-64 -rw-rw-r--. 1 tester tester50523 Aug 27 14:12 hadoop-auth-2.4.1.jar -rw-rw-r--. 1 tester tester 1062640 Aug 27 12:19 libhadoop.a -rw-rw-r--. 1 tester tester 1487564 Aug 27 11:14 libhadooppipes.a lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so - libhadoopsnappy.so.0.0.1 lrwxrwxrwx. 1 tester tester 24 Aug 27 07:08 libhadoopsnappy.so.0 - libhadoopsnappy.so.0.0.1 -rwxr-xr-x. 1 tester tester54961 Aug 27 07:08 libhadoopsnappy.so.0.0.1 -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so -rwxrwxr-x. 1 tester tester 630328 Aug 27 12:19 libhadoop.so.1.0.0 -rw-rw-r--. 1 tester tester 582472 Aug 27 11:14 libhadooputils.a -rw-rw-r--. 1 tester tester 298626 Aug 27 11:14 libhdfs.a -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so -rwxrwxr-x. 1 tester tester 200370 Aug 27 11:14 libhdfs.so.0.0.0 lrwxrwxrwx. 1 tester tester 55 Aug 27 07:08 libjvm.so - /usr/lib/jvm/jdk1.6.0_45/jre/lib/amd64/server/libjvm.so lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so - libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 25 Aug 27 07:08 libprotobuf-lite.so.8 - libprotobuf-lite.so.8.0.0 -rwxr-xr-x. 1 tester tester 964689 Aug 27 07:08 libprotobuf-lite.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so - libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 20 Aug 27 07:08 libprotobuf.so.8 - libprotobuf.so.8.0.0 -rwxr-xr-x. 1 tester tester 8300050 Aug 27 07:08 libprotobuf.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so - libprotoc.so.8.0.0 lrwxrwxrwx. 1 tester tester 18 Aug 27 07:08 libprotoc.so.8 - libprotoc.so.8.0.0 -rwxr-xr-x. 1 tester tester 9935810 Aug 27 07:08 libprotoc.so.8.0.0 -rw-r--r--. 1 tester tester 233554 Aug 27 15:19 libsnappy.a lrwxrwxrwx. 1 tester tester 23 Aug 27 11:32 libsnappy.so - /usr/lib64/libsnappy.so lrwxrwxrwx. 1 tester tester 23 Aug 27 11:33 libsnappy.so.1 - /usr/lib64/libsnappy.so -rwxr-xr-x. 1 tester tester 147726 Aug 27 07:08 libsnappy.so.1.2.0 drwxr-xr-x. 2 tester tester 4096 Aug 27 07:08 pkgconfig Regards Arthur On 23 Oct, 2014, at 10:57 am, Shao, Saisai saisai.s...@intel.com wrote: Hi Arthur, I think your problem might be different from what SPARK-3958(https://issues.apache.org/jira/browse/SPARK-3958) mentioned, seems your problem is more likely to be a library link problem, would you mind checking your Spark runtime to see if the snappy.so is loaded or not? (through lsof -p). I guess your problem is more likely to be a library not found problem. Thanks Jerry
Spark/HIVE Insert Into values Error
Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur hive CREATE TABLE students (name VARCHAR(64), age INT, gpa int); OK hive INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1); NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:693) at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:31374) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:29083) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28968) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28762) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1238) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:938) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:27 cannot recognize input near 'VALUES' '(' ''fred flintstone'' in select clause
Spark Hive Snappy Error
Hi, When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(1) from q8_national_market_share sqlContext.sql(select count(1) from q8_national_market_share).collect().foreach(println) java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.sql.hive.HadoopTableReader.init(TableReader.scala:68) at org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:68) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624) at
Re: How To Implement More Than One Subquery in Scala/Spark
Hi, Thank you so much! By the way, what is the DATEADD function in Scala/Spark? or how to implement DATEADD(MONTH, 3, '2013-07-01')” and DATEADD(YEAR, 1, '2014-01-01')” in Spark or Hive? Regards Arthur On 12 Oct, 2014, at 12:03 pm, Ilya Ganelin ilgan...@gmail.com wrote: Because of how closures work in Scala, there is no support for nested map/rdd-based operations. Specifically, if you have Context a { Context b { } } Operations within context b, when distributed across nodes, will no longer have visibility of variables specific to context a because that context is not distributed alongside that operation! To get around this you need to serialize your operations. For example , run a map job. Take the output of that and run a second map job to filter. Another option is to run two separate map jobs and join their results. Keep in mind that another useful technique is to execute the groupByKey routine , particularly if you want to operate on a particular variable. On Oct 11, 2014 11:09 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1 subquery in my Spark SQL, below are my sample table structures and a SQL that contains more than 1 subquery. Question 1: How to load a HIVE table into Scala/Spark? Question 2: How to implement a SQL_WITH_MORE_THAN_ONE_SUBQUERY in SCALA/SPARK? Question 3: What is the DATEADD function in Scala/Spark? or how to implement DATEADD(MONTH, 3, '2013-07-01')” and DATEADD(YEAR, 1, '2014-01-01')” in Spark or Hive? I can find HIVE (date_add(string startdate, int days)) but it is in days not MONTH / YEAR. Thanks. Regards Arthur === My sample SQL with more than 1 subquery: SELECT S_NAME, COUNT(*) AS NUMWAIT FROM SUPPLIER, LINEITEM L1, ORDERS WHERE S_SUPPKEY = L1.L_SUPPKEY AND O_ORDERKEY = L1.L_ORDERKEY AND O_ORDERSTATUS = 'F' AND L1.L_RECEIPTDATE L1.L_COMMITDATE AND EXISTS (SELECT * FROM LINEITEM L2 WHERE L2.L_ORDERKEY = L1.L_ORDERKEY AND L2.L_SUPPKEY L1.L_SUPPKEY) AND NOT EXISTS (SELECT * FROM LINEITEM L3 WHERE L3.L_ORDERKEY = L1.L_ORDERKEY AND L3.L_SUPPKEY L1.L_SUPPKEY AND L3.L_RECEIPTDATE L3.L_COMMITDATE) GROUP BY S_NAME ORDER BY NUMWAIT DESC, S_NAME limit 100; === Supplier Table: CREATE TABLE IF NOT EXISTS SUPPLIER ( S_SUPPKEY INTEGER PRIMARY KEY, S_NAMECHAR(25), S_ADDRESS VARCHAR(40), S_NATIONKEY BIGINT NOT NULL, S_PHONE CHAR(15), S_ACCTBAL DECIMAL, S_COMMENT VARCHAR(101) ) === Order Table: CREATE TABLE IF NOT EXISTS ORDERS ( O_ORDERKEYINTEGER PRIMARY KEY, O_CUSTKEY BIGINT NOT NULL, O_ORDERSTATUS CHAR(1), O_TOTALPRICE DECIMAL, O_ORDERDATE CHAR(10), O_ORDERPRIORITY CHAR(15), O_CLERK CHAR(15), O_SHIPPRIORITYINTEGER, O_COMMENT VARCHAR(79) === LineItem Table: CREATE TABLE IF NOT EXISTS LINEITEM ( L_ORDERKEY BIGINT not null, L_PARTKEY BIGINT, L_SUPPKEY BIGINT, L_LINENUMBERINTEGER not null, L_QUANTITY DECIMAL, L_EXTENDEDPRICE DECIMAL, L_DISCOUNT DECIMAL, L_TAX DECIMAL, L_SHIPDATE CHAR(10), L_COMMITDATECHAR(10), L_RECEIPTDATE CHAR(10), L_RETURNFLAGCHAR(1), L_LINESTATUSCHAR(1), L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44), CONSTRAINT pk PRIMARY KEY (L_ORDERKEY, L_LINENUMBER ) )
How To Implement More Than One Subquery in Scala/Spark
Hi, My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1 subquery in my Spark SQL, below are my sample table structures and a SQL that contains more than 1 subquery. Question 1: How to load a HIVE table into Scala/Spark? Question 2: How to implement a SQL_WITH_MORE_THAN_ONE_SUBQUERY in SCALA/SPARK? Question 3: What is the DATEADD function in Scala/Spark? or how to implement DATEADD(MONTH, 3, '2013-07-01')” and DATEADD(YEAR, 1, '2014-01-01')” in Spark or Hive? I can find HIVE (date_add(string startdate, int days)) but it is in days not MONTH / YEAR. Thanks. Regards Arthur === My sample SQL with more than 1 subquery: SELECT S_NAME, COUNT(*) AS NUMWAIT FROM SUPPLIER, LINEITEM L1, ORDERS WHERE S_SUPPKEY = L1.L_SUPPKEY AND O_ORDERKEY = L1.L_ORDERKEY AND O_ORDERSTATUS = 'F' AND L1.L_RECEIPTDATE L1.L_COMMITDATE AND EXISTS (SELECT * FROM LINEITEM L2 WHERE L2.L_ORDERKEY = L1.L_ORDERKEY AND L2.L_SUPPKEY L1.L_SUPPKEY) AND NOT EXISTS (SELECT * FROM LINEITEM L3 WHERE L3.L_ORDERKEY = L1.L_ORDERKEY AND L3.L_SUPPKEY L1.L_SUPPKEY AND L3.L_RECEIPTDATE L3.L_COMMITDATE) GROUP BY S_NAME ORDER BY NUMWAIT DESC, S_NAME limit 100; === Supplier Table: CREATE TABLE IF NOT EXISTS SUPPLIER ( S_SUPPKEY INTEGER PRIMARY KEY, S_NAME CHAR(25), S_ADDRESS VARCHAR(40), S_NATIONKEY BIGINT NOT NULL, S_PHONE CHAR(15), S_ACCTBAL DECIMAL, S_COMMENT VARCHAR(101) ) === Order Table: CREATE TABLE IF NOT EXISTS ORDERS ( O_ORDERKEY INTEGER PRIMARY KEY, O_CUSTKEY BIGINT NOT NULL, O_ORDERSTATUS CHAR(1), O_TOTALPRICEDECIMAL, O_ORDERDATE CHAR(10), O_ORDERPRIORITY CHAR(15), O_CLERK CHAR(15), O_SHIPPRIORITY INTEGER, O_COMMENT VARCHAR(79) === LineItem Table: CREATE TABLE IF NOT EXISTS LINEITEM ( L_ORDERKEY BIGINT not null, L_PARTKEY BIGINT, L_SUPPKEY BIGINT, L_LINENUMBERINTEGER not null, L_QUANTITY DECIMAL, L_EXTENDEDPRICE DECIMAL, L_DISCOUNT DECIMAL, L_TAX DECIMAL, L_SHIPDATE CHAR(10), L_COMMITDATECHAR(10), L_RECEIPTDATE CHAR(10), L_RETURNFLAGCHAR(1), L_LINESTATUSCHAR(1), L_SHIPINSTRUCT CHAR(25), L_SHIPMODE CHAR(10), L_COMMENT VARCHAR(44), CONSTRAINT pk PRIMARY KEY (L_ORDERKEY, L_LINENUMBER ) )
Re: Breaking the previous large-scale sort record with Spark
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu zhunanmcg...@gmail.com wrote: Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html. Summary: while Hadoop MapReduce held last year's 100 TB world record by sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. I want to thank Reynold Xin for leading this effort over the past few weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for providing the machines to make this possible. Finally, this result would of course not be possible without the many many other contributions, testing and feature requests from throughout the community. For an engine to scale from these multi-hour petabyte batch jobs down to 100-millisecond streaming and interactive queries is quite uncommon, and it's thanks to all of you folks that we are able to make this happen. Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to save Spark log into file
Hi, How can the spark log be saved into file instead of showing them on console? Below is my conf/log4j.properties conf/log4j.properties ### # Root logger option log4j.rootLogger=INFO, file # Direct log messages to a log file log4j.appender.file=org.apache.log4j.RollingFileAppender #Redirect to Tomcat logs folder log4j.appender.file.File=/hadoop_logs/spark/spark_logging.log log4j.appender.file.MaxFileSize=10MB log4j.appender.file.MaxBackupIndex=10 log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n ### I tried to stop and start spark again, it still shows INFO WARN log on console. Any ideas? Regards Arthur scala hiveContext.hql(show tables) warning: there were 1 deprecation warning(s); re-run with -deprecation for details 2014-10-03 19:35:01,554 INFO [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: show tables 2014-10-03 19:35:01,715 INFO [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed 2014-10-03 19:35:01,845 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 2014-10-03 19:35:01,847 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=Driver.run 2014-10-03 19:35:01,847 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=TimeToSubmit 2014-10-03 19:35:01,847 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=compile 2014-10-03 19:35:01,863 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=parse 2014-10-03 19:35:01,863 INFO [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: show tables 2014-10-03 19:35:01,863 INFO [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed 2014-10-03 19:35:01,863 INFO [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - /PERFLOG method=parse start=1412336101863 end=1412336101863 duration=0 2014-10-03 19:35:01,863 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=semanticAnalyze 2014-10-03 19:35:01,941 INFO [main] ql.Driver (Driver.java:compile(450)) - Semantic Analysis Completed 2014-10-03 19:35:01,941 INFO [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - /PERFLOG method=semanticAnalyze start=1412336101863 end=1412336101941 duration=78 2014-10-03 19:35:01,979 INFO [main] exec.ListSinkOperator (Operator.java:initialize(338)) - Initializing Self 0 OP 2014-10-03 19:35:01,980 INFO [main] exec.ListSinkOperator (Operator.java:initializeChildren(403)) - Operator 0 OP initialized 2014-10-03 19:35:01,980 INFO [main] exec.ListSinkOperator (Operator.java:initialize(378)) - Initialization Done 0 OP 2014-10-03 19:35:01,985 INFO [main] ql.Driver (Driver.java:getSchema(264)) - Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 2014-10-03 19:35:01,985 INFO [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - /PERFLOG method=compile start=1412336101847 end=1412336101985 duration=138 2014-10-03 19:35:01,985 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=Driver.execute 2014-10-03 19:35:01,985 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.job.name is deprecated. Instead, use mapreduce.job.name 2014-10-03 19:35:01,986 INFO [main] ql.Driver (Driver.java:execute(1117)) - Starting command: show tables 2014-10-03 19:35:01,994 INFO [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - /PERFLOG method=TimeToSubmit start=1412336101847 end=1412336101994 duration=147 2014-10-03 19:35:01,994 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=runTasks 2014-10-03 19:35:01,994 INFO [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - PERFLOG method=task.DDL.Stage-0 2014-10-03 19:35:02,019 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(411)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 2014-10-03 19:35:02,034 INFO [main] metastore.ObjectStore (ObjectStore.java:initialize(232)) - ObjectStore, initialize called 2014-10-03 19:35:02,084 WARN [main] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) org.datanucleus.api.jdo is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL file://hadoop/spark/lib_managed/jars/datanucleus-api-jdo-3.2.1.jar is already registered, and you are trying to register an identical plugin located at URL file://hadoop/spark-1.1.0_patched/lib_managed/jars/datanucleus-api-jdo-3.2.1.jar. 2014-10-03 19:35:02,108 WARN [main] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) org.datanucleus is already registered. Ensure you dont have multiple JAR
object hbase is not a member of package org.apache.hadoop
Hi, I have tried to to run HBaseTest.scala, but I got following errors, any ideas to how to fix them? Q1) scala package org.apache.spark.examples console:1: error: illegal start of definition package org.apache.spark.examples Q2) scala import org.apache.hadoop.hbase.mapreduce.TableInputFormat console:31: error: object hbase is not a member of package org.apache.hadoop import org.apache.hadoop.hbase.mapreduce.TableInputFormat Regards Arthur
Re: object hbase is not a member of package org.apache.hadoop
/artifactId\ + /exclusion\ + exclusion\ +groupIdorg.apache.hadoop/groupId\ +artifactIdhadoop-auth/artifactId\ + /exclusion\ + exclusion\ +groupIdorg.apache.hadoop/groupId\ +artifactIdhadoop-annotations/artifactId\ + /exclusion\ + exclusion\ +groupIdorg.apache.hadoop/groupId\ +artifactIdhadoop-hdfs/artifactId\ + /exclusion\ + exclusion\ +groupIdorg.apache.hbase/groupId\ +artifactIdhbase-hadoop1-compat/artifactId\ + /exclusion\ + exclusion\ +groupIdorg.apache.commons/groupId\ +artifactIdcommons-math/artifactId\ + /exclusion\ + exclusion\ +groupIdcom.sun.jersey/groupId\ +artifactIdjersey-core/artifactId\ + /exclusion\ + exclusion\ +groupIdorg.slf4j/groupId\ +artifactIdslf4j-api/artifactId\ + /exclusion\ + exclusion\ +groupIdcom.sun.jersey/groupId\ +artifactIdjersey-server/artifactId\ + /exclusion\ + exclusion\ +groupIdcom.sun.jersey/groupId\ +artifactIdjersey-core/artifactId\ + /exclusion\ + exclusion\ +groupIdcom.sun.jersey/groupId\ +artifactIdjersey-json/artifactId\ + /exclusion\ + exclusion\ +!-- hbase uses v2.4, which is better, but ...--\ +groupIdcommons-io/groupId\ +artifactIdcommons-io/artifactId\ + /exclusion\ +/exclusions\ + /dependency\ + dependency\ +groupIdorg.apache.hbase/groupId\ +artifactIdhbase-hadoop-compat/artifactId\ +version$\{hbase.version\}/version\ + /dependency\ + dependency\ +groupIdorg.apache.hbase/groupId\ +artifactIdhbase-hadoop-compat/artifactId\ +version$\{hbase.version\}/version\ +typetest-jar/type\ +scopetest/scope\ + /dependency\ dependency\ groupIdcom.twitter/groupId\ artifactIdalgebird-core_$\{scala.binary.version\}/artifactId\ }Please advise.RegardsArthurOn 14 Sep, 2014, at 10:48 pm, Ted Yu yuzhih...@gmail.com wrote:Sparkexamples builds against hbase 0.94 by default.If you want to run against 0.98, see:SPARK-1297https://issues.apache.org/jira/browse/SPARK-1297CheersOn Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote:Hi,I have tried to to runHBaseTest.scala, but I got following errors, any ideas to how to fix them?Q1)scala package org.apache.spark.examplesconsole:1: error: illegal start of definition package org.apache.spark.examplesQ2)scala import org.apache.hadoop.hbase.mapreduce.TableInputFormatconsole:31: error: object hbase is not a member of package org.apache.hadoop import org.apache.hadoop.hbase.mapreduce.TableInputFormatRegardsArthur
Re: object hbase is not a member of package org.apache.hadoop
Hi, Thanks! patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej Still got errors. Regards Arthur On 14 Sep, 2014, at 11:33 pm, Ted Yu yuzhih...@gmail.com wrote: spark-1297-v5.txt is level 0 patch Please use spark-1297-v5.txt Cheers On Sun, Sep 14, 2014 at 8:06 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks!! I tried to apply the patches, both spark-1297-v2.txt and spark-1297-v4.txt are good here, but not spark-1297-v5.txt: $ patch -p1 -i spark-1297-v4.txt patching file examples/pom.xml $ patch -p1 -i spark-1297-v5.txt can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git docs/building-with-maven.md docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- docs/building-with-maven.md |+++ docs/building-with-maven.md -- File to patch: Please advise. Regards Arthur On 14 Sep, 2014, at 10:48 pm, Ted Yu yuzhih...@gmail.com wrote: Spark examples builds against hbase 0.94 by default. If you want to run against 0.98, see: SPARK-1297 https://issues.apache.org/jira/browse/SPARK-1297 Cheers On Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have tried to to run HBaseTest.scala, but I got following errors, any ideas to how to fix them? Q1) scala package org.apache.spark.examples console:1: error: illegal start of definition package org.apache.spark.examples Q2) scala import org.apache.hadoop.hbase.mapreduce.TableInputFormat console:31: error: object hbase is not a member of package org.apache.hadoop import org.apache.hadoop.hbase.mapreduce.TableInputFormat Regards Arthur
Re: object hbase is not a member of package org.apache.hadoop
Hi, My bad. Tried again, worked. patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Thanks! Arthur On 14 Sep, 2014, at 11:38 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks! patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej Still got errors. Regards Arthur On 14 Sep, 2014, at 11:33 pm, Ted Yu yuzhih...@gmail.com wrote: spark-1297-v5.txt is level 0 patch Please use spark-1297-v5.txt Cheers On Sun, Sep 14, 2014 at 8:06 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks!! I tried to apply the patches, both spark-1297-v2.txt and spark-1297-v4.txt are good here, but not spark-1297-v5.txt: $ patch -p1 -i spark-1297-v4.txt patching file examples/pom.xml $ patch -p1 -i spark-1297-v5.txt can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git docs/building-with-maven.md docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- docs/building-with-maven.md |+++ docs/building-with-maven.md -- File to patch: Please advise. Regards Arthur On 14 Sep, 2014, at 10:48 pm, Ted Yu yuzhih...@gmail.com wrote: Spark examples builds against hbase 0.94 by default. If you want to run against 0.98, see: SPARK-1297 https://issues.apache.org/jira/browse/SPARK-1297 Cheers On Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have tried to to run HBaseTest.scala, but I got following errors, any ideas to how to fix them? Q1) scala package org.apache.spark.examples console:1: error: illegal start of definition package org.apache.spark.examples Q2) scala import org.apache.hadoop.hbase.mapreduce.TableInputFormat console:31: error: object hbase is not a member of package org.apache.hadoop import org.apache.hadoop.hbase.mapreduce.TableInputFormat Regards Arthur
Re: object hbase is not a member of package org.apache.hadoop
Hi, I applied the patch. 1) patched $ patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml 2) Compilation result [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [1.550s] [INFO] Spark Project Core SUCCESS [1:32.175s] [INFO] Spark Project Bagel ... SUCCESS [10.809s] [INFO] Spark Project GraphX .. SUCCESS [31.435s] [INFO] Spark Project Streaming ... SUCCESS [44.518s] [INFO] Spark Project ML Library .. SUCCESS [48.992s] [INFO] Spark Project Tools ... SUCCESS [7.028s] [INFO] Spark Project Catalyst SUCCESS [40.365s] [INFO] Spark Project SQL . SUCCESS [43.305s] [INFO] Spark Project Hive SUCCESS [36.464s] [INFO] Spark Project REPL SUCCESS [20.319s] [INFO] Spark Project YARN Parent POM . SUCCESS [1.032s] [INFO] Spark Project YARN Stable API . SUCCESS [19.379s] [INFO] Spark Project Hive Thrift Server .. SUCCESS [12.470s] [INFO] Spark Project Assembly SUCCESS [13.822s] [INFO] Spark Project External Twitter SUCCESS [9.566s] [INFO] Spark Project External Kafka .. SUCCESS [12.848s] [INFO] Spark Project External Flume Sink . SUCCESS [10.437s] [INFO] Spark Project External Flume .. SUCCESS [14.554s] [INFO] Spark Project External ZeroMQ . SUCCESS [9.994s] [INFO] Spark Project External MQTT ... SUCCESS [8.684s] [INFO] Spark Project Examples SUCCESS [1:31.610s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 9:41.700s [INFO] Finished at: Sun Sep 14 23:51:56 HKT 2014 [INFO] Final Memory: 83M/1071M [INFO] 3) testing: scala package org.apache.spark.examples console:1: error: illegal start of definition package org.apache.spark.examples ^ scala import org.apache.hadoop.hbase.client.HBaseAdmin import org.apache.hadoop.hbase.client.HBaseAdmin scala import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} scala import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.hbase.mapreduce.TableInputFormat scala import org.apache.spark._ import org.apache.spark._ scala object HBaseTest { | def main(args: Array[String]) { | val sparkConf = new SparkConf().setAppName(HBaseTest) | val sc = new SparkContext(sparkConf) | val conf = HBaseConfiguration.create() | // Other options for configuring scan behavior are available. More information available at | // http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html | conf.set(TableInputFormat.INPUT_TABLE, args(0)) | // Initialize hBase table if necessary | val admin = new HBaseAdmin(conf) | if (!admin.isTableAvailable(args(0))) { | val tableDesc = new HTableDescriptor(args(0)) | admin.createTable(tableDesc) | } | val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], | classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], | classOf[org.apache.hadoop.hbase.client.Result]) | hBaseRDD.count() | sc.stop() | } | } warning: there were 1 deprecation warning(s); re-run with -deprecation for details defined module HBaseTest Now only got error when trying to run package org.apache.spark.examples” Please advise. Regards Arthur On 14 Sep, 2014, at 11:41 pm, Ted Yu yuzhih...@gmail.com wrote: I applied the patch on master branch without rejects. If you use spark 1.0.2, use pom.xml attached to the JIRA. On Sun, Sep 14, 2014 at 8:38 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks! patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej Still got errors. Regards Arthur On 14 Sep, 2014, at 11:33 pm, Ted Yu yuzhih...@gmail.com wrote: spark-1297-v5.txt is level 0 patch Please use spark-1297-v5.txt Cheers On Sun, Sep 14, 2014 at 8:06 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks!! I tried to apply the patches, both spark-1297-v2.txt and spark-1297-v4
unable to create new native thread
Hi I am trying the Spark sample program “SparkPi”, I got an error unable to create new native thread, how to resolve this? 14/09/11 21:36:16 INFO scheduler.DAGScheduler: Completed ResultTask(0, 644) 14/09/11 21:36:16 INFO scheduler.TaskSetManager: Finished TID 643 in 43 ms on node1 (progress: 636/10) 14/09/11 21:36:16 INFO scheduler.DAGScheduler: Completed ResultTask(0, 643) 14/09/11 21:36:16 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-16] shutting down ActorSystem [spark] java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672) at scala.concurrent.forkjoin.ForkJoinPool.signalWork(ForkJoinPool.java:1966) at scala.concurrent.forkjoin.ForkJoinPool.externalPush(ForkJoinPool.java:1829) at scala.concurrent.forkjoin.ForkJoinPool.execute(ForkJoinPool.java:2955) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool.execute(AbstractDispatcher.scala:374) at akka.dispatch.ExecutorServiceDelegate$class.execute(ThreadPoolBuilder.scala:212) at akka.dispatch.Dispatcher$LazyExecutorServiceDelegate.execute(Dispatcher.scala:43) at akka.dispatch.Dispatcher.registerForExecution(Dispatcher.scala:118) at akka.dispatch.Dispatcher.dispatch(Dispatcher.scala:59) at akka.actor.dungeon.Dispatch$class.sendMessage(Dispatch.scala:120) at akka.actor.ActorCell.sendMessage(ActorCell.scala:338) at akka.actor.Cell$class.sendMessage(ActorCell.scala:259) at akka.actor.ActorCell.sendMessage(ActorCell.scala:338) at akka.actor.LocalActorRef.$bang(ActorRef.scala:389) at akka.actor.ActorRef.tell(ActorRef.scala:125) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:489) at akka.actor.ActorCell.invoke(ActorCell.scala:455) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Regards Arthur
Re: Dependency Problem with Spark / ScalaTest / SBT
Hi, What is your SBT command and the parameters? Arthur On 10 Sep, 2014, at 6:46 pm, Thorsten Bergler sp...@tbonline.de wrote: Hello, I am writing a Spark App which is already working so far. Now I started to build also some UnitTests, but I am running into some dependecy problems and I cannot find a solution right now. Perhaps someone could help me. I build my Spark Project with SBT and it seems to be configured well, because compiling, assembling and running the built jar with spark-submit are working well. Now I started with the UnitTests, which I located under /src/test/scala. When I call test in sbt, I get the following: 14/09/10 12:22:06 INFO storage.BlockManagerMaster: Registered BlockManager 14/09/10 12:22:06 INFO spark.HttpServer: Starting HTTP Server [trace] Stack trace suppressed: run last test:test for the full output. [error] Could not run test test.scala.SetSuite: java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse [info] Run completed in 626 milliseconds. [info] Total number of tests run: 0 [info] Suites: completed 0, aborted 0 [info] Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [error] Error during tests: [error] test.scala.SetSuite [error] (test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 3 s, completed 10.09.2014 12:22:06 last test:test gives me the following: last test:test [debug] Running TaskDef(test.scala.SetSuite, org.scalatest.tools.Framework$$anon$1@6e5626c8, false, [SuiteSelector]) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at test.scala.SetSuite.init(SparkTest.scala:16) I also noticed right now, that sbt run is also not working: 14/09/10 12:44:46 INFO spark.HttpServer: Starting HTTP Server [error] (run-main-2) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at main.scala.PartialDuplicateScanner$.main(PartialDuplicateScanner.scala:29) at main.scala.PartialDuplicateScanner.main(PartialDuplicateScanner.scala) Here is my Testprojekt.sbt file: name := Testprojekt version := 1.0 scalaVersion := 2.10.4 libraryDependencies ++= { Seq( org.apache.lucene % lucene-core % 4.9.0, org.apache.lucene % lucene-analyzers-common % 4.9.0, org.apache.lucene % lucene-queryparser % 4.9.0, (org.apache.spark %% spark-core % 1.0.2). exclude(org.mortbay.jetty, servlet-api). exclude(commons-beanutils, commons-beanutils-core). exclude(commons-collections, commons-collections). exclude(commons-collections, commons-collections). exclude(com.esotericsoftware.minlog, minlog). exclude(org.eclipse.jetty.orbit, javax.mail.glassfish). exclude(org.eclipse.jetty.orbit, javax.transaction). exclude(org.eclipse.jetty.orbit, javax.servlet) ) } resolvers += Akka Repository at http://repo.akka.io/releases/; - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL -- more than two tables for join
Hi, May be you can take a look about the following. http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html Good luck. Arthur On 10 Sep, 2014, at 9:09 pm, arunshell87 shell.a...@gmail.com wrote: Hi, I too had tried SQL queries with joins, MINUS , subqueries etc but they did not work in Spark Sql. I did not find any documentation on what queries work and what do not work in Spark SQL, may be we have to wait for the Spark book to be released in Feb-2015. I believe you can try HiveQL in Spark for your requirement. Thanks, Arun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-more-than-two-tables-for-join-tp13865p13877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark and Shark
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how to fix this? shark create table x1 (a INT); FAILED: Hive Internal Error: java.util.NoSuchElementException(null) 14/09/01 23:04:24 [main]: ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.util.NoSuchElementException(null) java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:925) at java.util.HashMap$ValueIterator.next(HashMap.java:950) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8117) at shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:150) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at shark.SharkDriver.compile(SharkDriver.scala:215) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:340) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at shark.SharkCliDriver$.main(SharkCliDriver.scala:237) at shark.SharkCliDriver.main(SharkCliDriver.scala) spark-env.sh #!/usr/bin/env bash export CLASSPATH=$HBASE_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.31-bin.jar export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} export SPARK_CLASSPATH=$SPARK_HOME/lib_managed/jars/mysql-connector-java-5.1.31-bin.jar export SPARK_WORKER_MEMORY=2g export HADOOP_HEAPSIZE=2000 spark-defaults.conf spark.executor.memory 2048m spark.shuffle.spill.compressfalse shark-env.sh #!/usr/bin/env bash export SPARK_MEM=2g export SHARK_MASTER_MEM=2g SPARK_JAVA_OPTS= -Dspark.local.dir=/tmp SPARK_JAVA_OPTS+=-Dspark.kryoserializer.buffer.mb=10 SPARK_JAVA_OPTS+=-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps export SPARK_JAVA_OPTS export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR=$SCALA_HOME/assembly/target/scala-2.10/spark-assembly-1.0.2-hadoop2.4.1.jar export SHARK_ASSEMBLY_JAR=target/scala-2.10/shark_2.10-0.9.2.jar export HIVE_CONF_DIR=$HIVE_HOME/conf export SPARK_LIBPATH=$HADOOP_HOME/lib/native/ export SPARK_LIBRARY_PATH=$HADOOP_HOME/lib/native/ export SPARK_CLASSPATH=$SHARK_HOME/lib/hadoop-snappy-0.0.1-SNAPSHOT.jar:$SHARK_HOME/lib/protobuf-java-2.5.0.jar Regards Arthur
Re: Spark Hive max key length is 767 bytes
Hi Michael, Thank you so much!! I have tried to change the following key length from 256 to 255 and from 767 to 766, it still didn’t work alter table COLUMNS_V2 modify column COMMENT VARCHAR(255); alter table INDEX_PARAMS modify column PARAM_KEY VARCHAR(255); alter table SD_PARAMS modify column PARAM_KEY VARCHAR(255); alter table SERDE_PARAMS modify column PARAM_KEY VARCHAR(255); alter table TABLE_PARAMS modify column PARAM_KEY VARCHAR(255); alter table TBLS modify column OWNER VARCHAR(766); alter table PART_COL_STATS modify column PARTITION_NAME VARCHAR(766); alter table PARTITION_KEYS modify column PKEY_TYPE VARCHAR(766); alter table PARTITIONS modify column PART_NAME VARCHAR(766); I use Hadoop 2.4.1 HBase 0.98.5 Hive 0.13, trying Spark 1.0.2 and Shark 0.9.2, and JDK1.6_45. Some questions: shark-0.9.2 is based on which Hive version? is HBase 0.98.x OK? is Hive 0.13.1 OK? and which Java? (I use JDK1.6 at the moment, it seems not working) spark-1.0.2 is based on which Hive version? is HBase 0.98.x OK? Regards Arthur On 30 Aug, 2014, at 1:40 am, Michael Armbrust mich...@databricks.com wrote: Spark SQL is based on Hive 12. They must have changed the maximum key size between 12 and 13. On Fri, Aug 29, 2014 at 4:38 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Tried the same thing in HIVE directly without issue: HIVE: hive create table test_datatype2 (testbigint bigint ); OK Time taken: 0.708 seconds hive drop table test_datatype2; OK Time taken: 23.272 seconds Then tried again in SPARK: scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 14/08/29 19:33:52 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@395c7b94 scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) res0: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:104 == Query Plan == Native command: executed by Hive scala hiveContext.hql(drop table test_datatype3) 14/08/29 19:34:14 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MPartition and subclasses resulted in no possible candidates Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. org.datanucleus.exceptions.NucleusDataStoreException: Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.verifyErrors(RDBMSStoreManager.java:3609) Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:25 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
Re: Spark Hive max key length is 767 bytes
Hi, Already done but still get the same error: (I use HIVE 0.13.1 Spark 1.0.2, Hadoop 2.4.1) Steps: Step 1) mysql: alter database hive character set latin1; Step 2) HIVE: hive create table test_datatype2 (testbigint bigint ); OK Time taken: 0.708 seconds hive drop table test_datatype2; OK Time taken: 23.272 seconds Step 3) scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 14/08/29 19:33:52 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@395c7b94 scala hiveContext.hql(“create table test_datatype3 (testbigint bigint)”) res0: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:104 == Query Plan == Native command: executed by Hive scala hiveContext.hql(drop table test_datatype3) 14/08/29 19:34:14 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MPartition and subclasses resulted in no possible candidates Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. org.datanucleus.exceptions.NucleusDataStoreException: Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.verifyErrors(RDBMSStoreManager.java:3609) Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Should I use HIVE 0.12.0 instead of HIVE 0.13.1? Regards Arthur On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g@gmail.com wrote: Oh, you may be running into an issue with your MySQL setup actually, try running alter database metastore_db character set latin1 so that way Hive (and the Spark HiveContext) can execute properly against the metastore. On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com (arthur.hk.c...@gmail.com) wrote: Hi, Tried the same thing in HIVE directly without issue: HIVE: hive create table test_datatype2 (testbigint bigint ); OK Time taken: 0.708 seconds hive drop table test_datatype2; OK Time taken: 23.272 seconds Then tried again in SPARK: scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 14/08/29 19:33:52 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@395c7b94 scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) res0: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:104 == Query Plan == Native command: executed by Hive scala hiveContext.hql(drop table test_datatype3) 14/08/29 19:34:14 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MPartition and subclasses resulted in no possible candidates Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. org.datanucleus.exceptions.NucleusDataStoreException: Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.verifyErrors(RDBMSStoreManager.java:3609) Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 INFO
Spark Master/Slave and HA
Hi, I have few questions about Spark Master and Slave setup: Here, I have 5 Hadoop nodes (n1, n2, n3, n4, and n5 respectively), at the moment I run Spark under these nodes: n1:Hadoop Active Name node, Hadoop Slave Spark Active Master n2:Hadoop Standby Name Node,Hadoop Salve Spark Slave n3: Hadoop Salve Spark Slave n4: Hadoop Salve Spark Slave n5: Hadoop Salve Spark Slave Questions: Q1: If I set n1 as both Spark Master and Spark Slave, I cannot start the Spark Cluster. does it mean that, unlike Hadoop, I cannot use the same machine to be both MASTER and SLAVE in Spark? n1:Hadoop Active Name node, Hadoop Slave Spark Active Master Spark Slave (failed to Start Spark) n2:Hadoop Standby Name Node,Hadoop Salve Spark Slave n3: Hadoop Salve Spark Slave n4: Hadoop SalveSpark Slave n5: Hadoop Salve Spark Slave Q2: I am planning Spark HA, what if I use n2 as Spark Standby Master and Spark Slave”? is Spark allowed to run Standby Master and Slave under same machine? n1:Hadoop Active Name node, Hadoop Slave Spark Active Master n2:Hadoop Standby Name Node,Hadoop SalveSpark Standby MasterSpark Slave n3: Hadoop Salve Spark Slave n4: Hadoop Salve Spark Slave n5: Hadoop Salve Spark Slave Q3: Does the Spark Master node do actual computation work like a worker or just a pure monitoring node? Regards Arthur
Spark and Shark Node: RAM Allocation
Hi, Is there any formula to calculate proper RAM allocation values for Spark and Shark based on Physical RAM, HADOOP and HBASE RAM usage? e.g. if a node has 32GB physical RAM spark-defaults.conf spark.executor.memory ?g spark-env.sh export SPARK_WORKER_MEMORY=? export HADOOP_HEAPSIZE=? shark-env.sh export SPARK_MEM=?g export SHARK_MASTER_MEM=?g spark-defaults.conf spark.executor.memory ?g Regards Arthur
Re: Spark Hive max key length is 767 bytes
Hi, Tried the same thing in HIVE directly without issue: HIVE: hive create table test_datatype2 (testbigint bigint ); OK Time taken: 0.708 seconds hive drop table test_datatype2; OK Time taken: 23.272 seconds Then tried again in SPARK: scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 14/08/29 19:33:52 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@395c7b94 scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) res0: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:104 == Query Plan == Native command: executed by Hive scala hiveContext.hql(drop table test_datatype3) 14/08/29 19:34:14 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MPartition and subclasses resulted in no possible candidates Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. org.datanucleus.exceptions.NucleusDataStoreException: Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.verifyErrors(RDBMSStoreManager.java:3609) Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:17 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 19:34:25 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Can anyone please help? Regards Arthur On 29 Aug, 2014, at 12:47 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (Please ignore if duplicated) Hi, I use Spark 1.0.2 with Hive 0.13.1 I have already set the hive mysql database to latine1; mysql: alter database hive character set latin1; Spark: scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala hiveContext.hql(create table test_datatype1 (testbigint bigint )) scala hiveContext.hql(drop table test_datatype1) 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:59 ERROR DataNucleus.Datastore: An exception was thrown while adding
Re: Compilation Error: Spark 1.0.2 with HBase 0.98
] | \- javax.activation:activation:jar:1.1:compile [INFO] +- org.apache.hbase:hbase-hadoop-compat:jar:0.98.5-hadoop1:compile [INFO] | \- org.apache.commons:commons-math:jar:2.1:compile [INFO] +- org.apache.hbase:hbase-hadoop-compat:test-jar:tests:0.98.5-hadoop1:test [INFO] +- com.twitter:algebird-core_2.10:jar:0.1.11:compile [INFO] | \- com.googlecode.javaewah:JavaEWAH:jar:0.6.6:compile [INFO] +- org.scalatest:scalatest_2.10:jar:2.1.5:test [INFO] | \- org.scala-lang:scala-reflect:jar:2.10.4:compile [INFO] +- org.scalacheck:scalacheck_2.10:jar:1.11.3:test [INFO] | \- org.scala-sbt:test-interface:jar:1.0:test [INFO] +- org.apache.cassandra:cassandra-all:jar:1.2.6:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.1.0:compile [INFO] | +- org.antlr:antlr:jar:3.2:compile [INFO] | +- com.googlecode.json-simple:json-simple:jar:1.1:compile [INFO] | +- org.yaml:snakeyaml:jar:1.6:compile [INFO] | +- edu.stanford.ppl:snaptree:jar:0.1:compile [INFO] | +- org.mindrot:jbcrypt:jar:0.3m:compile [INFO] | +- org.apache.thrift:libthrift:jar:0.7.0:compile [INFO] | | \- javax.servlet:servlet-api:jar:2.5:compile [INFO] | +- org.apache.cassandra:cassandra-thrift:jar:1.2.6:compile [INFO] | \- com.github.stephenc:jamm:jar:0.2.5:compile [INFO] \- com.github.scopt:scopt_2.10:jar:3.2.0:compile [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2.151s [INFO] Finished at: Thu Aug 28 18:12:45 HKT 2014 [INFO] Final Memory: 17M/479M [INFO] RegardsArthurOn 28 Aug, 2014, at 12:22 pm, Ted Yu yuzhih...@gmail.com wrote:I forgot to include '-Dhadoop.version=2.4.1' in the command below.The modified command passed.You can verify the dependence on hbase 0.98 through this command: mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests dependency:tree dep.txtCheersOn Wed, Aug 27, 2014 at 8:58 PM, Ted Yu yuzhih...@gmail.com wrote: Looks like the patch given by that URL only had the last commit.I have attached pom.xml forspark-1.0.2 toSPARK-1297 You can download it and replaceexamples/pom.xml with the downloaded pom I am running this command locally:mvn -Phbase-hadoop2,hadoop-2.4,yarn -DskipTests clean packageCheers On Wed, Aug 27, 2014 at 7:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted,Thanks.Tried [patch -p1 -i 1893.patch] (Hunk #1 FAILED at 45.) Is this normal?RegardsArthurpatch -p1 -i 1893.patch patching file examples/pom.xmlHunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines).1 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej patching file examples/pom.xmlHunk #1 FAILED at 54.Hunk #2 FAILED at 72. Hunk #3 succeeded at 122 (offset -49 lines).2 out of 3 hunks FAILED -- saving rejects to file examples/pom.xml.rej patching file docs/building-with-maven.md patching file examples/pom.xmlHunk #1 succeeded at 122 (offset -40 lines).Hunk #2 succeeded at 195 (offset -40 lines). On 28 Aug, 2014, at 10:53 am, Ted Yu yuzhih...@gmail.com wrote: Can you use this command ?patch -p1 -i 1893.patchCheersOn Wed, Aug 27, 2014 at 7:41 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted,I tried the following steps to apply the patch 1893 but got Hunk FAILED, can you please advise how to get thru this error? or is my spark-1.0.2 source not the correct one? RegardsArthurwget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgzcd spark-1.0.2 wget https://github.com/apache/spark/pull/1893.patch patch 1893.patchpatching file pom.xml Hunk #1 FAILED at 45.Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file pom.xml.rejpatching file pom.xml Hunk #1 FAILED at 54.Hunk #2 FAILED at 72. Hunk #3 FAILED at 171.3 out of 3 hunks FAILED -- saving rejects to file pom.xml.rej can't find file to patch at input line 267Perhaps you should have used the -p or --strip option? The text leading up to this was:-- ||From cd58437897bf02b644c2171404ccffae5d12a2be Mon Sep 17 00:00:00 2001 |From: tedyu yuzhih...@gmail.com|Date: Mon, 11 Aug 2014 15:57:46 -0700 |Subject: [PATCH 3/4] SPARK-1297 Upgrade HBase dependency to 0.98 - add| description to building-with-maven.md ||---| docs/building-with-maven.md | 3 +++ | 1 file changed, 3 insertions(+)| |diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644|--- a/docs/building-with-maven.md |+++ b/docs/building-with-maven.md-- File to patch: On 28 Aug, 2014, at 10:24 am, Ted Yu yuzhih...@gmail.com wrote: You can get the patch from this URL:https://github.com/apache/spark/pull/1893.patch BTW 0.98.5 has been released - you can specify0.98.5-hadoop2 in the pom.xml CheersOn Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com
Re: Compilation Error: Spark 1.0.2 with HBase 0.98
Hi, I tried to start Spark but failed: $ ./sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /mnt/hadoop/spark-1.0.2/sbin/../logs/spark-edhuser-org.apache.spark.deploy.master.Master-1-m133.out failed to launch org.apache.spark.deploy.master.Master: Failed to find Spark assembly in /mnt/hadoop/spark-1.0.2/assembly/target/scala-2.10/ $ ll assembly/ total 20 -rw-rw-r--. 1 hduser hadoop 11795 Jul 26 05:50 pom.xml -rw-rw-r--. 1 hduser hadoop 507 Jul 26 05:50 README drwxrwxr-x. 4 hduser hadoop 4096 Jul 26 05:50 src Regards Arthur On 28 Aug, 2014, at 6:19 pm, Ted Yu yuzhih...@gmail.com wrote: I see 0.98.5 in dep.txt You should be good to go. On Thu, Aug 28, 2014 at 3:16 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, tried mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests dependency:tree dep.txt Attached the dep. txt for your information. Regards Arthur On 28 Aug, 2014, at 12:22 pm, Ted Yu yuzhih...@gmail.com wrote: I forgot to include '-Dhadoop.version=2.4.1' in the command below. The modified command passed. You can verify the dependence on hbase 0.98 through this command: mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests dependency:tree dep.txt Cheers On Wed, Aug 27, 2014 at 8:58 PM, Ted Yu yuzhih...@gmail.com wrote: Looks like the patch given by that URL only had the last commit. I have attached pom.xml for spark-1.0.2 to SPARK-1297 You can download it and replace examples/pom.xml with the downloaded pom I am running this command locally: mvn -Phbase-hadoop2,hadoop-2.4,yarn -DskipTests clean package Cheers On Wed, Aug 27, 2014 at 7:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, Thanks. Tried [patch -p1 -i 1893.patch](Hunk #1 FAILED at 45.) Is this normal? Regards Arthur patch -p1 -i 1893.patch patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines). 1 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej patching file examples/pom.xml Hunk #1 FAILED at 54. Hunk #2 FAILED at 72. Hunk #3 succeeded at 122 (offset -49 lines). 2 out of 3 hunks FAILED -- saving rejects to file examples/pom.xml.rej patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 succeeded at 122 (offset -40 lines). Hunk #2 succeeded at 195 (offset -40 lines). On 28 Aug, 2014, at 10:53 am, Ted Yu yuzhih...@gmail.com wrote: Can you use this command ? patch -p1 -i 1893.patch Cheers On Wed, Aug 27, 2014 at 7:41 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, I tried the following steps to apply the patch 1893 but got Hunk FAILED, can you please advise how to get thru this error? or is my spark-1.0.2 source not the correct one? Regards Arthur wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 wget https://github.com/apache/spark/pull/1893.patch patch 1893.patch patching file pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file pom.xml.rej patching file pom.xml Hunk #1 FAILED at 54. Hunk #2 FAILED at 72. Hunk #3 FAILED at 171. 3 out of 3 hunks FAILED -- saving rejects to file pom.xml.rej can't find file to patch at input line 267 Perhaps you should have used the -p or --strip option? The text leading up to this was: -- | |From cd58437897bf02b644c2171404ccffae5d12a2be Mon Sep 17 00:00:00 2001 |From: tedyu yuzhih...@gmail.com |Date: Mon, 11 Aug 2014 15:57:46 -0700 |Subject: [PATCH 3/4] SPARK-1297 Upgrade HBase dependency to 0.98 - add | description to building-with-maven.md | |--- | docs/building-with-maven.md | 3 +++ | 1 file changed, 3 insertions(+) | |diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- a/docs/building-with-maven.md |+++ b/docs/building-with-maven.md -- File to patch: On 28 Aug, 2014, at 10:24 am, Ted Yu yuzhih...@gmail.com wrote: You can get the patch from this URL: https://github.com/apache/spark/pull/1893.patch BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml Cheers On Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, Thank you so much!! As I am new to Spark, can you please advise the steps about how to apply this patch to my spark-1.0.2 source folder? Regards Arthur On 28 Aug, 2014, at 10:13 am, Ted Yu yuzhih...@gmail.com wrote: See SPARK-1297 The pull request is here: https://github.com/apache/spark/pull/1893 On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated
SPARK-1297 patch error (spark-1297-v4.txt )
Hi, I have just tried to apply the patch of SPARK-1297: https://issues.apache.org/jira/browse/SPARK-1297 There are two files in it, named spark-1297-v2.txt and spark-1297-v4.txt respectively. When applying the 2nd one, I got Hunk #1 FAILED at 45 Can you please advise how to fix it in order to make the compilation of Spark Project Examples success? (Here: Hadoop 2.4.1, HBase 0.98.5, Spark 1.0.2) Regards Arthur patch -p1 -i spark-1297-v4.txt patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines). 1 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej below is the content of examples/pom.xml.rej: +++ examples/pom.xml @@ -45,6 +45,39 @@ /dependency /dependencies /profile +profile + idhbase-hadoop2/id + activation +property + namehbase.profile/name + valuehadoop2/value +/property + /activation + properties +protobuf.version2.5.0/protobuf.version +hbase.version0.98.4-hadoop2/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile +profile + idhbase-hadoop1/id + activation +property + name!hbase.profile/name +/property + /activation + properties +hbase.version0.98.4-hadoop1/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile + /profiles dependencies This caused the related compilation failed: [INFO] Spark Project Examples FAILURE [0.102s]
Re: SPARK-1297 patch error (spark-1297-v4.txt )
Hi, patch -p1 -i spark-1297-v5.txt can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git docs/building-with-maven.md docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- docs/building-with-maven.md |+++ docs/building-with-maven.md -- File to patch: Please advise Regards Arthur On 29 Aug, 2014, at 12:50 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have just tried to apply the patch of SPARK-1297: https://issues.apache.org/jira/browse/SPARK-1297 There are two files in it, named spark-1297-v2.txt and spark-1297-v4.txt respectively. When applying the 2nd one, I got Hunk #1 FAILED at 45 Can you please advise how to fix it in order to make the compilation of Spark Project Examples success? (Here: Hadoop 2.4.1, HBase 0.98.5, Spark 1.0.2) Regards Arthur patch -p1 -i spark-1297-v4.txt patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines). 1 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej below is the content of examples/pom.xml.rej: +++ examples/pom.xml @@ -45,6 +45,39 @@ /dependency /dependencies /profile +profile + idhbase-hadoop2/id + activation +property + namehbase.profile/name + valuehadoop2/value +/property + /activation + properties +protobuf.version2.5.0/protobuf.version +hbase.version0.98.4-hadoop2/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile +profile + idhbase-hadoop1/id + activation +property + name!hbase.profile/name +/property + /activation + properties +hbase.version0.98.4-hadoop1/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile + /profiles dependencies This caused the related compilation failed: [INFO] Spark Project Examples FAILURE [0.102s]
Re: SPARK-1297 patch error (spark-1297-v4.txt )
Hi Ted, I downloaded pom.xml to examples directory. It works, thanks!! Regards Arthur [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [2.119s] [INFO] Spark Project Core SUCCESS [1:27.100s] [INFO] Spark Project Bagel ... SUCCESS [10.261s] [INFO] Spark Project GraphX .. SUCCESS [31.332s] [INFO] Spark Project ML Library .. SUCCESS [35.226s] [INFO] Spark Project Streaming ... SUCCESS [39.135s] [INFO] Spark Project Tools ... SUCCESS [6.469s] [INFO] Spark Project Catalyst SUCCESS [36.521s] [INFO] Spark Project SQL . SUCCESS [35.488s] [INFO] Spark Project Hive SUCCESS [35.296s] [INFO] Spark Project REPL SUCCESS [18.668s] [INFO] Spark Project YARN Parent POM . SUCCESS [0.583s] [INFO] Spark Project YARN Stable API . SUCCESS [15.989s] [INFO] Spark Project Assembly SUCCESS [11.497s] [INFO] Spark Project External Twitter SUCCESS [8.777s] [INFO] Spark Project External Kafka .. SUCCESS [9.688s] [INFO] Spark Project External Flume .. SUCCESS [10.411s] [INFO] Spark Project External ZeroMQ . SUCCESS [9.511s] [INFO] Spark Project External MQTT ... SUCCESS [8.451s] [INFO] Spark Project Examples SUCCESS [1:40.240s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 8:33.350s [INFO] Finished at: Fri Aug 29 01:58:00 HKT 2014 [INFO] Final Memory: 82M/1086M [INFO] On 29 Aug, 2014, at 1:36 am, Ted Yu yuzhih...@gmail.com wrote: bq. Spark 1.0.2 For the above release, you can download pom.xml attached to the JIRA and place it in examples directory I verified that the build against 0.98.4 worked using this command: mvn -Dhbase.profile=hadoop2 -Phadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests clean package Patch v5 is @ level 0 - you don't need to use -p1 in the patch command. Cheers On Thu, Aug 28, 2014 at 9:50 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have just tried to apply the patch of SPARK-1297: https://issues.apache.org/jira/browse/SPARK-1297 There are two files in it, named spark-1297-v2.txt and spark-1297-v4.txt respectively. When applying the 2nd one, I got Hunk #1 FAILED at 45 Can you please advise how to fix it in order to make the compilation of Spark Project Examples success? (Here: Hadoop 2.4.1, HBase 0.98.5, Spark 1.0.2) Regards Arthur patch -p1 -i spark-1297-v4.txt patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines). 1 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej below is the content of examples/pom.xml.rej: +++ examples/pom.xml @@ -45,6 +45,39 @@ /dependency /dependencies /profile +profile + idhbase-hadoop2/id + activation +property + namehbase.profile/name + valuehadoop2/value +/property + /activation + properties +protobuf.version2.5.0/protobuf.version +hbase.version0.98.4-hadoop2/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile +profile + idhbase-hadoop1/id + activation +property + name!hbase.profile/name +/property + /activation + properties +hbase.version0.98.4-hadoop1/hbase.version + /properties + dependencyManagement +dependencies +/dependencies + /dependencyManagement +/profile + /profiles dependencies This caused the related compilation failed: [INFO] Spark Project Examples FAILURE [0.102s]
org.apache.hadoop.io.compress.SnappyCodec not found
Hi, I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and HBase. With default setting in Spark 1.0.2, when trying to load a file I got Class org.apache.hadoop.io.compress.SnappyCodec not found Can you please advise how to enable snappy in Spark? Regards Arthur scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 55 more Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.SnappyCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at
Re: org.apache.hadoop.io.compress.SnappyCodec not found
Hi, my check native result: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library Native library checking: hadoop: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so zlib: true /lib64/libz.so.1 snappy: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1 lz4:true revision:99 bzip2: false Any idea how to enable or disable snappy in Spark? Regards Arthur On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and HBase. With default setting in Spark 1.0.2, when trying to load a file I got Class org.apache.hadoop.io.compress.SnappyCodec not found Can you please advise how to enable snappy in Spark? Regards Arthur scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InvocationTargetException
Re: org.apache.hadoop.io.compress.SnappyCodec not found
) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 55 more Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.GzipCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 60 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.io.compress.GzipCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) ... 62 more Any idea to fix this issue? Regards Arthur On 29 Aug, 2014, at 2:58 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, my check native result: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library Native library checking: hadoop: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so zlib: true /lib64/libz.so.1 snappy: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1 lz4:true revision:99 bzip2: false Any idea how to enable or disable snappy in Spark? Regards Arthur On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and HBase. With default setting in Spark 1.0.2, when trying to load a file I got Class org.apache.hadoop.io.compress.SnappyCodec not found Can you please advise how to enable snappy in Spark? Regards Arthur scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884
Re: org.apache.hadoop.io.compress.SnappyCodec not found
Hi, I fixed the issue by copying libsnappy.so to Java ire. Regards Arthur On 29 Aug, 2014, at 8:12 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, If change my etc/hadoop/core-site.xml from property nameio.compression.codecs/name value org.apache.hadoop.io.compress.SnappyCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec /value /property to property nameio.compression.codecs/name value org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.SnappyCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec /value /property and run the test again, I found this time it cannot find org.apache.hadoop.io.compress.GzipCodec scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke
Spark Hive max key length is 767 bytes
(Please ignore if duplicated) Hi, I use Spark 1.0.2 with Hive 0.13.1 I have already set the hive mysql database to latine1; mysql: alter database hive character set latin1; Spark: scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala hiveContext.hql(create table test_datatype1 (testbigint bigint )) scala hiveContext.hql(drop table test_datatype1) 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:55 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 14/08/29 12:31:59 ERROR DataNucleus.Datastore: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:408) at com.mysql.jdbc.Util.getInstance(Util.java:383) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4226) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4158) Can you please advise what would be wrong? Regards Arthur
Compilaon Error: Spark 1.0.2 with HBase 0.98
Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 edit project/SparkBuild.scala, set HBASE_VERSION // HBase version; set as appropriate. val HBASE_VERSION = 0.98.2 edit pom.xml with following values hadoop.version2.4.1/hadoop.version protobuf.version2.5.0/protobuf.version yarn.version${hadoop.version}/yarn.version hbase.version0.98.5/hbase.version zookeeper.version3.4.6/zookeeper.version hive.version0.13.1/hive.version SPARK_HADOOP_VERSION=2.4.1 SPARK_YARN=true sbt/sbt clean assembly but it fails because of UNRESOLVED DEPENDENCIES hbase;0.98.2 Can you please advise how to compile Spark 1.0.2 with HBase 0.98? or should I set HBASE_VERSION back to “0.94.6? Regards Arthur [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.hbase#hbase;0.98.2: not found [warn] :: sbt.ResolveException: unresolved dependency: org.apache.hbase#hbase;0.98.2: not found at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:217) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:126) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:125) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:104) at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:51) at sbt.IvySbt$$anon$3.call(Ivy.scala:60) at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:98) at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:81) at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:102) at xsbt.boot.Using$.withResource(Using.scala:11) at xsbt.boot.Using$.apply(Using.scala:10) at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:62) at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:52) at xsbt.boot.Locks$.apply0(Locks.scala:31) at xsbt.boot.Locks$.apply(Locks.scala:28) at sbt.IvySbt.withDefaultLogger(Ivy.scala:60) at sbt.IvySbt.withIvy(Ivy.scala:101) at sbt.IvySbt.withIvy(Ivy.scala:97) at sbt.IvySbt$Module.withModule(Ivy.scala:116) at sbt.IvyActions$.update(IvyActions.scala:125) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1170) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1168) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1191) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1189) at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1193) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1188) at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45) at sbt.Classpaths$.cachedUpdate(Defaults.scala:1196) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1161) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1139) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:42) at sbt.std.Transform$$anon$4.work(System.scala:64) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18) at sbt.Execute.work(Execute.scala:244) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160) at sbt.CompletionService$$anon$2.call(CompletionService.scala:30) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) [error] (examples/*:update) sbt.ResolveException: unresolved dependency: org.apache.hbase#hbase;0.98.2: not found [error]
Re: Compilation Error: Spark 1.0.2 with HBase 0.98
Hi Ted, Thank you so much!! As I am new to Spark, can you please advise the steps about how to apply this patch to my spark-1.0.2 source folder? Regards Arthur On 28 Aug, 2014, at 10:13 am, Ted Yu yuzhih...@gmail.com wrote: See SPARK-1297 The pull request is here: https://github.com/apache/spark/pull/1893 On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated) Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 edit project/SparkBuild.scala, set HBASE_VERSION // HBase version; set as appropriate. val HBASE_VERSION = 0.98.2 edit pom.xml with following values hadoop.version2.4.1/hadoop.version protobuf.version2.5.0/protobuf.version yarn.version${hadoop.version}/yarn.version hbase.version0.98.5/hbase.version zookeeper.version3.4.6/zookeeper.version hive.version0.13.1/hive.version SPARK_HADOOP_VERSION=2.4.1 SPARK_YARN=true sbt/sbt clean assembly but it fails because of UNRESOLVED DEPENDENCIES hbase;0.98.2 Can you please advise how to compile Spark 1.0.2 with HBase 0.98? or should I set HBASE_VERSION back to “0.94.6? Regards Arthur [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.hbase#hbase;0.98.2: not found [warn] :: sbt.ResolveException: unresolved dependency: org.apache.hbase#hbase;0.98.2: not found at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:217) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:126) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:125) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:104) at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:51) at sbt.IvySbt$$anon$3.call(Ivy.scala:60) at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:98) at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:81) at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:102) at xsbt.boot.Using$.withResource(Using.scala:11) at xsbt.boot.Using$.apply(Using.scala:10) at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:62) at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:52) at xsbt.boot.Locks$.apply0(Locks.scala:31) at xsbt.boot.Locks$.apply(Locks.scala:28) at sbt.IvySbt.withDefaultLogger(Ivy.scala:60) at sbt.IvySbt.withIvy(Ivy.scala:101) at sbt.IvySbt.withIvy(Ivy.scala:97) at sbt.IvySbt$Module.withModule(Ivy.scala:116) at sbt.IvyActions$.update(IvyActions.scala:125) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1170) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1168) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1191) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1189) at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1193) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1188) at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45) at sbt.Classpaths$.cachedUpdate(Defaults.scala:1196) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1161) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1139) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:42) at sbt.std.Transform$$anon$4.work(System.scala:64) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18) at sbt.Execute.work(Execute.scala:244) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160) at sbt.CompletionService$$anon$2.call(CompletionService.scala:30) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run
Re: Compilation Error: Spark 1.0.2 with HBase 0.98
Hi Ted, I tried the following steps to apply the patch 1893 but got Hunk FAILED, can you please advise how to get thru this error? or is my spark-1.0.2 source not the correct one? Regards Arthur wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 wget https://github.com/apache/spark/pull/1893.patch patch 1893.patch patching file pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file pom.xml.rej patching file pom.xml Hunk #1 FAILED at 54. Hunk #2 FAILED at 72. Hunk #3 FAILED at 171. 3 out of 3 hunks FAILED -- saving rejects to file pom.xml.rej can't find file to patch at input line 267 Perhaps you should have used the -p or --strip option? The text leading up to this was: -- | |From cd58437897bf02b644c2171404ccffae5d12a2be Mon Sep 17 00:00:00 2001 |From: tedyu yuzhih...@gmail.com |Date: Mon, 11 Aug 2014 15:57:46 -0700 |Subject: [PATCH 3/4] SPARK-1297 Upgrade HBase dependency to 0.98 - add | description to building-with-maven.md | |--- | docs/building-with-maven.md | 3 +++ | 1 file changed, 3 insertions(+) | |diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- a/docs/building-with-maven.md |+++ b/docs/building-with-maven.md -- File to patch: On 28 Aug, 2014, at 10:24 am, Ted Yu yuzhih...@gmail.com wrote: You can get the patch from this URL: https://github.com/apache/spark/pull/1893.patch BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml Cheers On Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, Thank you so much!! As I am new to Spark, can you please advise the steps about how to apply this patch to my spark-1.0.2 source folder? Regards Arthur On 28 Aug, 2014, at 10:13 am, Ted Yu yuzhih...@gmail.com wrote: See SPARK-1297 The pull request is here: https://github.com/apache/spark/pull/1893 On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated) Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 edit project/SparkBuild.scala, set HBASE_VERSION // HBase version; set as appropriate. val HBASE_VERSION = 0.98.2 edit pom.xml with following values hadoop.version2.4.1/hadoop.version protobuf.version2.5.0/protobuf.version yarn.version${hadoop.version}/yarn.version hbase.version0.98.5/hbase.version zookeeper.version3.4.6/zookeeper.version hive.version0.13.1/hive.version SPARK_HADOOP_VERSION=2.4.1 SPARK_YARN=true sbt/sbt clean assembly but it fails because of UNRESOLVED DEPENDENCIES hbase;0.98.2 Can you please advise how to compile Spark 1.0.2 with HBase 0.98? or should I set HBASE_VERSION back to “0.94.6? Regards Arthur [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.hbase#hbase;0.98.2: not found [warn] :: sbt.ResolveException: unresolved dependency: org.apache.hbase#hbase;0.98.2: not found at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:217) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:126) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:125) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:104) at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:51) at sbt.IvySbt$$anon$3.call(Ivy.scala:60) at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:98) at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:81) at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:102) at xsbt.boot.Using$.withResource(Using.scala:11) at xsbt.boot.Using$.apply(Using.scala:10) at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:62) at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:52) at xsbt.boot.Locks$.apply0(Locks.scala:31) at xsbt.boot.Locks$.apply(Locks.scala:28) at sbt.IvySbt.withDefaultLogger(Ivy.scala:60) at sbt.IvySbt.withIvy(Ivy.scala:101) at sbt.IvySbt.withIvy(Ivy.scala:97) at sbt.IvySbt$Module.withModule(Ivy.scala:116) at sbt.IvyActions$.update(IvyActions.scala:125) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply
Re: Compilation Error: Spark 1.0.2 with HBase 0.98
Hi Ted, Thanks. Tried [patch -p1 -i 1893.patch](Hunk #1 FAILED at 45.) Is this normal? Regards Arthur patch -p1 -i 1893.patch patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines). 1 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej patching file examples/pom.xml Hunk #1 FAILED at 54. Hunk #2 FAILED at 72. Hunk #3 succeeded at 122 (offset -49 lines). 2 out of 3 hunks FAILED -- saving rejects to file examples/pom.xml.rej patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 succeeded at 122 (offset -40 lines). Hunk #2 succeeded at 195 (offset -40 lines). On 28 Aug, 2014, at 10:53 am, Ted Yu yuzhih...@gmail.com wrote: Can you use this command ? patch -p1 -i 1893.patch Cheers On Wed, Aug 27, 2014 at 7:41 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, I tried the following steps to apply the patch 1893 but got Hunk FAILED, can you please advise how to get thru this error? or is my spark-1.0.2 source not the correct one? Regards Arthur wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 wget https://github.com/apache/spark/pull/1893.patch patch 1893.patch patching file pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file pom.xml.rej patching file pom.xml Hunk #1 FAILED at 54. Hunk #2 FAILED at 72. Hunk #3 FAILED at 171. 3 out of 3 hunks FAILED -- saving rejects to file pom.xml.rej can't find file to patch at input line 267 Perhaps you should have used the -p or --strip option? The text leading up to this was: -- | |From cd58437897bf02b644c2171404ccffae5d12a2be Mon Sep 17 00:00:00 2001 |From: tedyu yuzhih...@gmail.com |Date: Mon, 11 Aug 2014 15:57:46 -0700 |Subject: [PATCH 3/4] SPARK-1297 Upgrade HBase dependency to 0.98 - add | description to building-with-maven.md | |--- | docs/building-with-maven.md | 3 +++ | 1 file changed, 3 insertions(+) | |diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- a/docs/building-with-maven.md |+++ b/docs/building-with-maven.md -- File to patch: On 28 Aug, 2014, at 10:24 am, Ted Yu yuzhih...@gmail.com wrote: You can get the patch from this URL: https://github.com/apache/spark/pull/1893.patch BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml Cheers On Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, Thank you so much!! As I am new to Spark, can you please advise the steps about how to apply this patch to my spark-1.0.2 source folder? Regards Arthur On 28 Aug, 2014, at 10:13 am, Ted Yu yuzhih...@gmail.com wrote: See SPARK-1297 The pull request is here: https://github.com/apache/spark/pull/1893 On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated) Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 edit project/SparkBuild.scala, set HBASE_VERSION // HBase version; set as appropriate. val HBASE_VERSION = 0.98.2 edit pom.xml with following values hadoop.version2.4.1/hadoop.version protobuf.version2.5.0/protobuf.version yarn.version${hadoop.version}/yarn.version hbase.version0.98.5/hbase.version zookeeper.version3.4.6/zookeeper.version hive.version0.13.1/hive.version SPARK_HADOOP_VERSION=2.4.1 SPARK_YARN=true sbt/sbt clean assembly but it fails because of UNRESOLVED DEPENDENCIES hbase;0.98.2 Can you please advise how to compile Spark 1.0.2 with HBase 0.98? or should I set HBASE_VERSION back to “0.94.6? Regards Arthur [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.hbase#hbase;0.98.2: not found [warn] :: sbt.ResolveException: unresolved dependency: org.apache.hbase#hbase;0.98.2: not found at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:217) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:126) at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:125) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116) at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:104) at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:51) at sbt.IvySbt$$anon$3.call(Ivy.scala:60
Compilation FAILURE : Spark 1.0.2 / Project Hive (0.13.1)
Hi, I use Hadoop 2.4.1, HBase 0.98.5, Zookeeper 3.4.6 and Hive 0.13.1. I just tried to compile Spark 1.0.2, but got error on Spark Project Hive, can you please advise which repository has org.spark-project.hive:hive-metastore:jar:0.13.1? FYI, below is my repository setting in maven which would be old: repository idnexus/id namelocal private nexus/name urlhttp://maven.oschina.net/content/groups/public//url releases enabledtrue/enabled /releases snapshots enabledfalse/enabled /snapshots /repository export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -DskipTests clean package [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [1.892s] [INFO] Spark Project Core SUCCESS [1:33.698s] [INFO] Spark Project Bagel ... SUCCESS [12.270s] [INFO] Spark Project GraphX .. SUCCESS [2:16.343s] [INFO] Spark Project ML Library .. SUCCESS [4:18.495s] [INFO] Spark Project Streaming ... SUCCESS [39.765s] [INFO] Spark Project Tools ... SUCCESS [9.173s] [INFO] Spark Project Catalyst SUCCESS [35.462s] [INFO] Spark Project SQL . SUCCESS [1:16.118s] [INFO] Spark Project Hive FAILURE [1:36.816s] [INFO] Spark Project REPL SKIPPED [INFO] Spark Project YARN Parent POM . SKIPPED [INFO] Spark Project YARN Stable API . SKIPPED [INFO] Spark Project Assembly SKIPPED [INFO] Spark Project External Twitter SKIPPED [INFO] Spark Project External Kafka .. SKIPPED [INFO] Spark Project External Flume .. SKIPPED [INFO] Spark Project External ZeroMQ . SKIPPED [INFO] Spark Project External MQTT ... SKIPPED [INFO] Spark Project Examples SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 12:40.616s [INFO] Finished at: Thu Aug 28 11:41:07 HKT 2014 [INFO] Final Memory: 37M/826M [INFO] [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.0.2: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Could not find artifact org.spark-project.hive:hive-metastore:jar:0.13.1 in nexus-osc (http://maven.oschina.net/content/groups/public/) - [Help 1] Regards Arthur