Re: OOM with groupBy + saveAsTextFile

2014-11-01 Thread arthur.hk.c...@gmail.com
Hi, FYI as follows. Could you post your heap size settings as well your Spark app code? Regards Arthur 3.1.3 Detail Message: Requested array size exceeds VM limit The detail message Requested array size exceeds VM limit indicates that the application (or APIs used by that application)

Spark 1.1.0 on Hive 0.13.1

2014-10-29 Thread arthur.hk.c...@gmail.com
Hi, My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please advise. Or, any news about when will Spark 1.1.0 on Hive 0.1.3.1 be available? Regards Arthur - To unsubscribe, e-mail:

Re: Spark 1.1.0 on Hive 0.13.1

2014-10-29 Thread arthur.hk.c...@gmail.com
branch. On 10/29/14 7:43 PM, arthur.hk.c...@gmail.com wrote: Hi, My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please advise. Or, any news about when will Spark 1.1.0 on Hive 0.1.3.1 be available? Regards Arthur

Re: Spark/HIVE Insert Into values Error

2014-10-26 Thread arthur.hk.c...@gmail.com
, arthur.hk.c...@gmail.com wrote: Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur hive CREATE TABLE students (name VARCHAR(64), age INT, gpa int); OK hive INSERT INTO TABLE

Re: Spark 1.1.0 and Hive 0.12.0 Compatibility Issue

2014-10-24 Thread arthur.hk.c...@gmail.com
...@databricks.com wrote: Can you show the DDL for the table? It looks like the SerDe might be saying it will produce a decimal type but is actually producing a string. On Thu, Oct 23, 2014 at 3:17 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi My Spark is 1.1.0 and Hive is 0.12

Re: Spark: Order by Failed, java.lang.NullPointerException

2014-10-24 Thread arthur.hk.c...@gmail.com
. Thanks Best Regards On Thu, Oct 23, 2014 at 5:59 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I got java.lang.NullPointerException. Please help! sqlContext.sql(select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem

Re: Spark Hive Snappy Error

2014-10-23 Thread arthur.hk.c...@gmail.com
-snappy-0.0.1-SNAPSHOT.jar But for spark itself, it depends on snappy-0.2.jar. Is there any possibility that this problem caused by different version of snappy? Thanks Jerry From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Thursday, October 23, 2014 11:32 AM

Aggregation Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

2014-10-23 Thread arthur.hk.c...@gmail.com
Hi, I got $TreeNodeException, few questions: Q1) How should I do aggregation in SparK? Can I use aggregation directly in SQL? or Q1) Should I use SQL to load the data to form RDD then use scala to do the aggregation? Regards Arthur MySQL (good one, without aggregation):

Re: Aggregation Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

2014-10-23 Thread arthur.hk.c...@gmail.com
Arthur On 23 Oct, 2014, at 9:36 pm, Yin Huai huaiyin@gmail.com wrote: Hello Arthur, You can use do aggregations in SQL. How did you create LINEITEM? Thanks, Yin On Thu, Oct 23, 2014 at 8:54 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I got

Spark 1.1.0 and Hive 0.12.0 Compatibility Issue

2014-10-23 Thread arthur.hk.c...@gmail.com
(Please ignore if duplicated) Hi, My Spark is 1.1.0 and Hive is 0.12, I tried to run the same query in both Hive-0.12.0 then Spark-1.1.0, HiveQL works while SparkSQL failed. hive select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer c

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Friday, October 17, 2014 7:13 AM To: user Cc: arthur.hk.c...@gmail.com Subject: Spark Hive Snappy Error Hi, When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Wednesday, October 22, 2014 8:35 PM To: Shao, Saisai Cc: arthur.hk.c...@gmail.com; user Subject: Re: Spark Hive Snappy Error Hi, Yes, I can always reproduce the issue: about you workload, Spark configuration, JDK version and OS version

ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, I just tried sample PI calculation on Spark Cluster, after returning the Pi result, it shows ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m37,35662) not found ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://m33:7077

Re: ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, I have managed to resolve it because a wrong setting. Please ignore this . Regards Arthur On 23 Oct, 2014, at 5:14 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: 14/10/23 05:09:04 WARN ConnectionManager: All connections not cleaned up

Spark: Order by Failed, java.lang.NullPointerException

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, I got java.lang.NullPointerException. Please help! sqlContext.sql(select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem limit 10).collect().foreach(println); 2014-10-23 08:20:12,024 INFO

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi,Please find the attached file.{\rtf1\ansi\ansicpg1252\cocoartf1265\cocoasubrtf210 {\fonttbl\f0\fnil\fcharset0 Menlo-Regular;} {\colortbl;\red255\green255\blue255;} \paperw11900\paperh16840\margl1440\margr1440\vieww26300\viewh12480\viewkind0

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi May I know where to configure Spark to load libhadoop.so? Regards Arthur On 23 Oct, 2014, at 11:31 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Please find the attached file. lsof.rtf my spark-default.xml # Default system properties included when running

Spark/HIVE Insert Into values Error

2014-10-17 Thread arthur.hk.c...@gmail.com
Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur hive CREATE TABLE students (name VARCHAR(64), age INT, gpa int); OK hive INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1);

Spark Hive Snappy Error

2014-10-16 Thread arthur.hk.c...@gmail.com
Hi, When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(1) from q8_national_market_share sqlContext.sql(select

Re: How To Implement More Than One Subquery in Scala/Spark

2014-10-13 Thread arthur.hk.c...@gmail.com
that another useful technique is to execute the groupByKey routine , particularly if you want to operate on a particular variable. On Oct 11, 2014 11:09 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1

How To Implement More Than One Subquery in Scala/Spark

2014-10-11 Thread arthur.hk.c...@gmail.com
Hi, My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1 subquery in my Spark SQL, below are my sample table structures and a SQL that contains more than 1 subquery. Question 1: How to load a HIVE table into Scala/Spark? Question 2: How to implement a

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread arthur.hk.c...@gmail.com
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu zhunanmcg...@gmail.com wrote: Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul

How to save Spark log into file

2014-10-03 Thread arthur.hk.c...@gmail.com
Hi, How can the spark log be saved into file instead of showing them on console? Below is my conf/log4j.properties conf/log4j.properties ### # Root logger option log4j.rootLogger=INFO, file # Direct log messages to a log file log4j.appender.file=org.apache.log4j.RollingFileAppender #Redirect

object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, I have tried to to run HBaseTest.scala, but I got following errors, any ideas to how to fix them? Q1) scala package org.apache.spark.examples console:1: error: illegal start of definition package org.apache.spark.examples Q2) scala import

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
advise.RegardsArthurOn 14 Sep, 2014, at 10:48 pm, Ted Yu yuzhih...@gmail.com wrote:Sparkexamples builds against hbase 0.94 by default.If you want to run against 0.98, see:SPARK-1297https://issues.apache.org/jira/browse/SPARK-1297CheersOn Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com arthur.hk.c

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
, Ted Yu yuzhih...@gmail.com wrote: spark-1297-v5.txt is level 0 patch Please use spark-1297-v5.txt Cheers On Sun, Sep 14, 2014 at 8:06 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks!! I tried to apply the patches, both spark-1297-v2.txt and spark-1297-v4

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, My bad. Tried again, worked. patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Thanks! Arthur On 14 Sep, 2014, at 11:38 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks! patch -p0 -i spark-1297-v5.txt

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
on master branch without rejects. If you use spark 1.0.2, use pom.xml attached to the JIRA. On Sun, Sep 14, 2014 at 8:38 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks! patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples

unable to create new native thread

2014-09-11 Thread arthur.hk.c...@gmail.com
Hi I am trying the Spark sample program “SparkPi”, I got an error unable to create new native thread, how to resolve this? 14/09/11 21:36:16 INFO scheduler.DAGScheduler: Completed ResultTask(0, 644) 14/09/11 21:36:16 INFO scheduler.TaskSetManager: Finished TID 643 in 43 ms on node1 (progress:

Re: Dependency Problem with Spark / ScalaTest / SBT

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi, What is your SBT command and the parameters? Arthur On 10 Sep, 2014, at 6:46 pm, Thorsten Bergler sp...@tbonline.de wrote: Hello, I am writing a Spark App which is already working so far. Now I started to build also some UnitTests, but I am running into some dependecy problems and

Re: Spark SQL -- more than two tables for join

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi, May be you can take a look about the following. http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html Good luck. Arthur On 10 Sep, 2014, at 9:09 pm, arunshell87 shell.a...@gmail.com wrote: Hi, I too had tried SQL queries with joins, MINUS ,

Spark and Shark

2014-09-01 Thread arthur.hk.c...@gmail.com
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread arthur.hk.c...@gmail.com
size between 12 and 13. On Fri, Aug 29, 2014 at 4:38 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Tried the same thing in HIVE directly without issue: HIVE: hive create table test_datatype2 (testbigint bigint ); OK Time taken: 0.708 seconds hive drop table

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread arthur.hk.c...@gmail.com
into an issue with your MySQL setup actually, try running alter database metastore_db character set latin1 so that way Hive (and the Spark HiveContext) can execute properly against the metastore. On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com (arthur.hk.c...@gmail.com

Spark Master/Slave and HA

2014-08-30 Thread arthur.hk.c...@gmail.com
Hi, I have few questions about Spark Master and Slave setup: Here, I have 5 Hadoop nodes (n1, n2, n3, n4, and n5 respectively), at the moment I run Spark under these nodes: n1:Hadoop Active Name node, Hadoop Slave Spark Active Master

Spark and Shark Node: RAM Allocation

2014-08-30 Thread arthur.hk.c...@gmail.com
Hi, Is there any formula to calculate proper RAM allocation values for Spark and Shark based on Physical RAM, HADOOP and HBASE RAM usage? e.g. if a node has 32GB physical RAM spark-defaults.conf spark.executor.memory ?g spark-env.sh export SPARK_WORKER_MEMORY=? export

Re: Spark Hive max key length is 767 bytes

2014-08-29 Thread arthur.hk.c...@gmail.com
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Can anyone please help? Regards Arthur On 29 Aug, 2014, at 12:47 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (Please ignore if duplicated) Hi, I use Spark 1.0.2 with Hive 0.13.1 I have already set the hive

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-28 Thread arthur.hk.c...@gmail.com
, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted,Thanks.Tried [patch -p1 -i 1893.patch] (Hunk #1 FAILED at 45.) Is this normal?RegardsArthurpatch -p1 -i 1893.patch patching file examples/pom.xmlHunk #1 FAILED at 45. Hunk #2 succeeded at 94 (offset -16 lines).1 out of 2 hunks FAILED

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-28 Thread arthur.hk.c...@gmail.com
pm, Ted Yu yuzhih...@gmail.com wrote: I see 0.98.5 in dep.txt You should be good to go. On Thu, Aug 28, 2014 at 3:16 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, tried mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests dependency:tree

SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I have just tried to apply the patch of SPARK-1297: https://issues.apache.org/jira/browse/SPARK-1297 There are two files in it, named spark-1297-v2.txt and spark-1297-v4.txt respectively. When applying the 2nd one, I got Hunk #1 FAILED at 45 Can you please advise how to fix it in order

Re: SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread arthur.hk.c...@gmail.com
/building-with-maven.md |+++ docs/building-with-maven.md -- File to patch: Please advise Regards Arthur On 29 Aug, 2014, at 12:50 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have just tried to apply the patch of SPARK-1297: https

Re: SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread arthur.hk.c...@gmail.com
=hadoop2 -Phadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests clean package Patch v5 is @ level 0 - you don't need to use -p1 in the patch command. Cheers On Thu, Aug 28, 2014 at 9:50 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have just tried to apply

org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and HBase. With default setting in Spark 1.0.2, when trying to load a file I got Class org.apache.hadoop.io.compress.SnappyCodec not found Can you please advise how to enable snappy in Spark? Regards Arthur scala

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
? Regards Arthur On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and HBase. With default setting in Spark 1.0.2, when trying to load a file I got Class

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, my check native result: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I fixed the issue by copying libsnappy.so to Java ire. Regards Arthur On 29 Aug, 2014, at 8:12 am, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, If change my etc/hadoop/core-site.xml from property nameio.compression.codecs/name value

Spark Hive max key length is 767 bytes

2014-08-28 Thread arthur.hk.c...@gmail.com
(Please ignore if duplicated) Hi, I use Spark 1.0.2 with Hive 0.13.1 I have already set the hive mysql database to latine1; mysql: alter database hive character set latin1; Spark: scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala hiveContext.hql(create table

Compilaon Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 edit project/SparkBuild.scala, set HBASE_VERSION // HBase version; set as appropriate. val

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
/apache/spark/pull/1893 On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated) Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
/spark/pull/1893.patch BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml Cheers On Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted, Thank you so much!! As I am new to Spark, can you please advise the steps

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
(offset -40 lines). Hunk #2 succeeded at 195 (offset -40 lines). On 28 Aug, 2014, at 10:53 am, Ted Yu yuzhih...@gmail.com wrote: Can you use this command ? patch -p1 -i 1893.patch Cheers On Wed, Aug 27, 2014 at 7:41 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi Ted

Compilation FAILURE : Spark 1.0.2 / Project Hive (0.13.1)

2014-08-27 Thread arthur.hk.c...@gmail.com
Hi, I use Hadoop 2.4.1, HBase 0.98.5, Zookeeper 3.4.6 and Hive 0.13.1. I just tried to compile Spark 1.0.2, but got error on Spark Project Hive, can you please advise which repository has org.spark-project.hive:hive-metastore:jar:0.13.1? FYI, below is my repository setting in maven which