Re: spark sql

2014-08-02 Thread Ted Yu
I noticed misspelling in compilation error (extra letter 'a'): new Function*a* But in your code the spelling was right. A bit confused. On Fri, Aug 1, 2014 at 1:32 PM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi Team, I'm not able to print the values from Spark Sql

Re: Spark Hbase job taking long time

2014-08-07 Thread Ted Yu
On Wed, Aug 6, 2014 at 6:41 AM, Ted Yu yuzhih...@gmail.com wrote: Can you try specifying some value (100, e.g.) for hbase.mapreduce.scan.cachedrows in your conf ? bq. table contains 10lakh rows How many rows are there in the table ? nit: Example uses classOf[TableInputFormat] instead

Re: Multiple column families vs Multiple tables

2014-08-19 Thread Ted Yu
bq. does not do well with anything above two or three column families Current hbase releases, such as 0.98.x, would do better than the above. 5 column families should be accommodated. Cheers On Tue, Aug 19, 2014 at 3:06 PM, Wei Liu wei@stellarloyalty.com wrote: We are doing schema

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread Ted Yu
See SPARK-1297 The pull request is here: https://github.com/apache/spark/pull/1893 On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: (correction: Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated) Hi, I need to use

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread Ted Yu
!! As I am new to Spark, can you please advise the steps about how to apply this patch to my spark-1.0.2 source folder? Regards Arthur On 28 Aug, 2014, at 10:13 am, Ted Yu yuzhih...@gmail.com wrote: See SPARK-1297 The pull request is here: https://github.com/apache/spark/pull/1893 On Wed

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread Ted Yu
-with-maven.md -- File to patch: On 28 Aug, 2014, at 10:24 am, Ted Yu yuzhih...@gmail.com wrote: You can get the patch from this URL: https://github.com/apache/spark/pull/1893.patch BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread Ted Yu
/pom.xml.rej patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 succeeded at 122 (offset -40 lines). Hunk #2 succeeded at 195 (offset -40 lines). On 28 Aug, 2014, at 10:53 am, Ted Yu yuzhih...@gmail.com wrote: Can you use this command ? patch -p1 -i 1893.patch

Re: Compilation FAILURE : Spark 1.0.2 / Project Hive (0.13.1)

2014-08-27 Thread Ted Yu
See this thread: http://search-hadoop.com/m/JW1q5wwgyL1/Working+Formula+for+Hive+0.13subj=Re+Working+Formula+for+Hive+0+13+ On Wed, Aug 27, 2014 at 8:54 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I use Hadoop 2.4.1, HBase 0.98.5, Zookeeper 3.4.6 and Hive 0.13.1. I

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread Ted Yu
:58 PM, Ted Yu yuzhih...@gmail.com wrote: Looks like the patch given by that URL only had the last commit. I have attached pom.xml for spark-1.0.2 to SPARK-1297 You can download it and replace examples/pom.xml with the downloaded pom I am running this command locally: mvn -Phbase-hadoop2

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-28 Thread Ted Yu
information. Regards Arthur On 28 Aug, 2014, at 12:22 pm, Ted Yu yuzhih...@gmail.com wrote: I forgot to include '-Dhadoop.version=2.4.1' in the command below. The modified command passed. You can verify the dependence on hbase 0.98 through this command: mvn -Phbase-hadoop2,hadoop-2.4,yarn

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-28 Thread Ted Yu
/assembly/target/scala-2.10/ $ ll assembly/ total 20 -rw-rw-r--. 1 hduser hadoop 11795 Jul 26 05:50 pom.xml -rw-rw-r--. 1 hduser hadoop 507 Jul 26 05:50 README drwxrwxr-x. 4 hduser hadoop 4096 Jul 26 05:50 *src* Regards Arthur On 28 Aug, 2014, at 6:19 pm, Ted Yu yuzhih...@gmail.com

Re: SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread Ted Yu
I attached patch v5 which corresponds to the pull request. Please try again. On Thu, Aug 28, 2014 at 9:50 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have just tried to apply the patch of SPARK-1297: https://issues.apache.org/jira/browse/SPARK-1297 There are two

Re: SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread Ted Yu
bq. Spark 1.0.2 For the above release, you can download pom.xml attached to the JIRA and place it in examples directory I verified that the build against 0.98.4 worked using this command: mvn -Dhbase.profile=hadoop2 -Phadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests clean package Patch v5

Re: org.apache.spark.examples.xxx

2014-08-30 Thread Ted Yu
bq. how was the spark...example...jar file build? You can use the following command to build against hadoop 2.4: mvn -Phadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests clean package examples jar can be found under examples/target Cheers On Sat, Aug 30, 2014 at 6:54 AM, Akhil Das

Re: org.apache.spark.examples.xxx

2014-08-30 Thread Ted Yu
Did you run sbt under /home/filip/spark-ex-regression ? '~/git/spark/data/mllib/sample_linear_regression_data.txt' was interpreted as rooted under /home/filip/spark-ex-regression Cheers On Sat, Aug 30, 2014 at 9:28 AM, filipus floe...@gmail.com wrote: compilation works but execution not at

Re: Spark Streaming into HBase

2014-09-03 Thread Ted Yu
the default one with the CDH5.1.0 distro. Thank you for the help. On Wed, Sep 3, 2014 at 2:09 PM, Ted Yu yuzhih...@gmail.com wrote: Is hbase-site.xml in the classpath ? Do you observe any exception from the code below or in region server log ? Which hbase release are you using ? On Wed, Sep 3

Re: How to profile a spark application

2014-09-08 Thread Ted Yu
See https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit On Sep 8, 2014, at 2:48 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, Can someone tell me how to profile a spark application. -Karthik

Re: compiling spark source code

2014-09-13 Thread Ted Yu
bq. [error] (repl/compile:compile) Compilation failed Can you pastebin more of the output ? Cheers

Re: compiling spark source code

2014-09-13 Thread Ted Yu
bq. [error] File name too long It is not clear which file(s) loadfiles was loading. Is the filename in earlier part of the output ? Cheers On Sat, Sep 13, 2014 at 10:58 AM, kkptninja kkptni...@gmail.com wrote: Hi Ted, Thanks for the prompt reply :) please find details of the issue at this

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
Spark examples builds against hbase 0.94 by default. If you want to run against 0.98, see: SPARK-1297 https://issues.apache.org/jira/browse/SPARK-1297 Cheers On Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have tried to to run

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
-with-maven.md docs/building-with-maven.md |index 672d0ef..f8bcd2b 100644 |--- docs/building-with-maven.md |+++ docs/building-with-maven.md -- File to patch: Please advise. Regards Arthur On 14 Sep, 2014, at 10:48 pm, Ted Yu yuzhih...@gmail.com wrote: Spark examples

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej Still got errors. Regards Arthur On 14 Sep, 2014, at 11:33 pm, Ted Yu yuzhih...@gmail.com wrote: spark-1297-v5.txt is level 0 patch Please

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
defined module HBaseTest Now only got error when trying to run package org.apache.spark.examples” Please advise. Regards Arthur On 14 Sep, 2014, at 11:41 pm, Ted Yu yuzhih...@gmail.com wrote: I applied the patch on master branch without rejects. If you use spark 1.0.2, use pom.xml

Re: HBase and non-existent TableInputFormat

2014-09-16 Thread Ted Yu
bq. TableInputFormat does not even exist in hbase-client API It is in hbase-server module. Take a look at http://hbase.apache.org/book.html#mapreduce.example.read On Tue, Sep 16, 2014 at 8:18 AM, Y. Dong tq00...@gmail.com wrote: Hello, I’m currently using spark-core 1.1 and hbase 0.98.5 and

Re: HBase and non-existent TableInputFormat

2014-09-16 Thread Ted Yu
. Just curious, what’s the difference between hbase-client and hbase-server? On 16 Sep 2014, at 17:01, Ted Yu yuzhih...@gmail.com wrote: bq. TableInputFormat does not even exist in hbase-client API It is in hbase-server module. Take a look at http://hbase.apache.org/book.html

Re: HBase 0.96+ with Spark 1.0+

2014-09-18 Thread Ted Yu
The stack trace mentioned OutOfMemory error. See: http://stackoverflow.com/questions/3003855/increase-permgen-space On Sep 18, 2014, at 1:59 AM, Reinis Vicups sp...@orbit-x.de wrote: I am humbly bumping this since even after another week of trying I haven't had luck to fix this yet. On

Re:

2014-09-24 Thread Ted Yu
bq. at com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$ anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179) Can you reveal what HbaseRDDBatch.scala does ? Cheers On Wed, Sep 24, 2014 at 8:46 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: One of my big spark program

Re:

2014-09-24 Thread Ted Yu
). BTW, I found batched Put actually faster than generating HFiles... Jianshi On Wed, Sep 24, 2014 at 11:49 PM, Ted Yu yuzhih...@gmail.com wrote: bq. at com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$ anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179) Can you reveal

Re:

2014-09-24 Thread Ted Yu
HFiles... Jianshi On Wed, Sep 24, 2014 at 11:49 PM, Ted Yu yuzhih...@gmail.com wrote: bq. at com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$ anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179) Can you reveal what HbaseRDDBatch.scala does ? Cheers On Wed, Sep 24

Re: task getting stuck

2014-09-24 Thread Ted Yu
previous reply to Debasish, all region servers are idle. I don't think it's caused by hotspotting. Besides, only 6 out of 3000 tasks were stuck, and their inputs are about only 80MB each. Jianshi On Wed, Sep 24, 2014 at 11:58 PM, Ted Yu yuzhih...@gmail.com wrote: I was thinking along the same

Re: Spark Hbase

2014-09-24 Thread Ted Yu
Take a look at the following under examples: examples/src//main/python/hbase_inputformat.py examples/src//main/python/hbase_outputformat.py examples/src//main/scala/org/apache/spark/examples/HBaseTest.scala examples/src//main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala

Re:

2014-09-25 Thread Ted Yu
only 80MB each. Jianshi On Wed, Sep 24, 2014 at 11:58 PM, Ted Yu yuzhih...@gmail.com wrote: I was thinking along the same line. Jianshi: See http://hbase.apache.org/book.html#d0e6369 On Wed, Sep 24, 2014 at 8:56 AM, Debasish Das debasish.da...@gmail.com wrote: HBase regionserver needs

Re: Build error when using spark with breeze

2014-09-26 Thread Ted Yu
spark-core's dependency on commons-math3 is @ test scope (core/pom.xml): dependency groupIdorg.apache.commons/groupId artifactIdcommons-math3/artifactId version3.3/version scopetest/scope /dependency Adjusting the scope should solve the problem below. On Fri, Sep

Re: Build error when using spark with breeze

2014-09-26 Thread Ted Yu
, 2014 at 5:47 PM, Ted Yu yuzhih...@gmail.com wrote: spark-core's dependency on commons-math3 is @ test scope (core/pom.xml): dependency groupIdorg.apache.commons/groupId artifactIdcommons-math3/artifactId version3.3/version scopetest/scope /dependency Adjusting

Re: How to do broadcast join in SparkSQL

2014-09-28 Thread Ted Yu
Have you looked at SPARK-1800 ? e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala Cheers On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: I cannot find it in the documentation. And I have a dozen dimension tables to (left) join... Cheers,

Re: Reading from HBase is too slow

2014-09-30 Thread Ted Yu
Can you launch a job which exercises TableInputFormat on the same table without using Spark ? This would show whether the slowdown is in HBase code or somewhere else. Cheers On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao xiaotao.cs@gmail.com wrote: I checked HBase UI. Well, this table is not

Re: Reading from HBase is too slow

2014-10-01 Thread Ted Yu
As far as I know, that feature is not in CDH 5.0.0 FYI On Wed, Oct 1, 2014 at 9:34 AM, Vladimir Rodionov vrodio...@splicemachine.com wrote: Using TableInputFormat is not the fastest way of reading data from HBase. Do not expect 100s of Mb per sec. You probably should take a look at M/R over

Re: Spark inside Eclipse

2014-10-01 Thread Ted Yu
Cycling bits: http://search-hadoop.com/m/JW1q5wxkXH/spark+eclipsesubj=Buidling+spark+in+Eclipse+Kepler On Wed, Oct 1, 2014 at 4:35 PM, Sanjay Subramanian sanjaysubraman...@yahoo.com.invalid wrote: hey guys Is there a way to run Spark in local mode from within Eclipse. I am running Eclipse

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread Ted Yu
Cycling bits: http://search-hadoop.com/m/JW1q5UX9S1/breeze+sparksubj=Build+error+when+using+spark+with+breeze On Sat, Oct 4, 2014 at 12:59 PM, anny9699 anny9...@gmail.com wrote: Hi, I use the breeze.stats.distributions.Bernoulli in my code, however met this problem

Re: org/apache/commons/math3/random/RandomGenerator issue

2014-10-04 Thread Ted Yu
by changing the spark-1.1.0 core pom file? Thanks! On Sat, Oct 4, 2014 at 1:06 PM, Ted Yu yuzhih...@gmail.com wrote: Cycling bits: http://search-hadoop.com/m/JW1q5UX9S1/breeze+sparksubj=Build+error+when+using+spark+with+breeze On Sat, Oct 4, 2014 at 12:59 PM, anny9699 anny9...@gmail.com wrote

Re: Building pyspark with maven?

2014-10-08 Thread Ted Yu
Have you looked at http://spark.apache.org/docs/latest/building-with-maven.html ? Especially http://spark.apache.org/docs/latest/building-with-maven.html#building-for-pyspark-on-yarn Cheers On Wed, Oct 8, 2014 at 2:01 PM, Stephen Boesch java...@gmail.com wrote: The build instructions for

Re: how to find the sources for spark-project

2014-10-11 Thread Ted Yu
I found this on computer where I built Spark: $ jar tvf /homes/hortonzy/.m2/repository//org/spark-project/hive/hive-exec/0.13.1/hive-exec-0.13.1.jar | grep ParquetHiveSerDe 2228 Mon Jun 02 12:50:16 UTC 2014 org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe$1.class 1442 Mon Jun 02

Re: ClasssNotFoundExeception was thrown while trying to save rdd

2014-10-12 Thread Ted Yu
Your app is named scala.HBaseApp Does it read / write to HBase ? Just curious. On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao xiaotao.cs@gmail.com wrote: Hi all, I'm using CDH 5.0.1 (Spark 0.9) and submitting a job in Spark Standalone Cluster mode. The job is quite simple as follows:

Re: How to close resources shared in executor?

2014-10-15 Thread Ted Yu
Pardon me - there was typo in previous email. Calling table.close() is the recommended approach. HConnectionManager does reference counting. When all references to the underlying connection are gone, connection would be released. Cheers On Wed, Oct 15, 2014 at 7:13 AM, Ted Yu yuzhih

Re: How to close resources shared in executor?

2014-10-16 Thread Ted Yu
in which shutdown hooks run is not defined so * were problematic for clients of HConnection that wanted to register their * own shutdown hooks so we removed ours though this shifts the onus for * cleanup to the client. ​ 2014-10-15 22:31 GMT+08:00 Ted Yu yuzhih...@gmail.com: Pardon me

Re: How to close resources shared in executor?

2014-10-16 Thread Ted Yu
(); this.pool = pool; this.finishSetup(); } in which cleanupConnectionOnClose is false 2014-10-16 22:51 GMT+08:00 Ted Yu yuzhih...@gmail.com: Which hbase release are you using ? Let me refer to 0.94 code hbase. Take a look at the following method in src/main/java/org/apache/hadoop

Re: buffer overflow when running Kmeans

2014-10-21 Thread Ted Yu
Just posted below for a similar question. Have you seen this thread ? http://search-hadoop.com/m/JW1q5ezXPH/KryoException%253A+Buffer+overflowsubj=RE+spark+nbsp+kryo+serilizable+nbsp+exception On Tue, Oct 21, 2014 at 2:44 PM, Yang tedd...@gmail.com wrote: this is the stack trace I got

Re:

2014-10-22 Thread Ted Yu
See first section of http://spark.apache.org/community On Wed, Oct 22, 2014 at 7:42 AM, Margusja mar...@roo.ee wrote: unsubscribe - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail:

Re: Exceptions not caught?

2014-10-23 Thread Ted Yu
Can you show the stack trace ? Also, how do you catch exceptions ? Did you specify TProtocolException ? Cheers On Thu, Oct 23, 2014 at 3:40 PM, ankits ankitso...@gmail.com wrote: Hi, I'm running a spark job and encountering an exception related to thrift. I wanted to know where this is

Re: Exceptions not caught?

2014-10-23 Thread Ted Yu
bq. Required field 'X' is unset! Struct:Y Can you check your class Y and fix the above ? Cheers On Thu, Oct 23, 2014 at 3:55 PM, ankits ankitso...@gmail.com wrote: I am simply catching all exceptions (like case e:Throwable = println(caught: +e) ) Here is the stack trace: 2014-10-23

Re: Ephemeral Hive metastore for HiveContext?

2014-10-27 Thread Ted Yu
Please see https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore Cheers On Oct 27, 2014, at 6:20 AM, Cheng Lian lian.cs@gmail.com wrote: I have never tried this yet, but maybe you can use an in-memory Derby database as

Re: Unsubscribe

2014-10-27 Thread Ted Yu
Take a look at the first section of: http://spark.apache.org/community Cheers On Mon, Oct 27, 2014 at 6:50 AM, Ian Ferreira ianferre...@hotmail.com wrote: unsubscribe

Re: install sbt

2014-10-28 Thread Ted Yu
Have you read this ? http://lancegatlin.org/tech/centos-6-install-sbt On Tue, Oct 28, 2014 at 7:54 AM, Pagliari, Roberto rpagli...@appcomsci.com wrote: Is there a repo or some kind of instruction about how to install sbt for centos? Thanks,

Re: Reading from Hbase using python

2014-11-12 Thread Ted Yu
Can you give us a bit more detail: hbase release you're using. whether you can reproduce using hbase shell. I did the following using hbase shell against 0.98.4: hbase(main):001:0 create 'test', 'f1' 0 row(s) in 2.9140 seconds = Hbase::Table - test hbase(main):002:0 put 'test', 'row1', 'f1:1',

Re: Reading from Hbase using python

2014-11-12 Thread Ted Yu
the solution. Do u have any idea? 2014-11-12 18:26 GMT-02:00 Ted Yu yuzhih...@gmail.com: Can you give us a bit more detail: hbase release you're using. whether you can reproduce using hbase shell. I did the following using hbase shell against 0.98.4: hbase(main):001:0 create 'test', 'f1' 0

Re: Reading from Hbase using python

2014-11-12 Thread Ted Yu
; } return CellUtil.cloneValue(cells[0]); This explains why you only got one row. In the thread you mentioned, see the code posted by freedafeng which iterates the Cells in Result. Cheers On Wed, Nov 12, 2014 at 1:04 PM, Ted Yu yuzhih...@gmail.com wrote: To my knowledge, Spark 1.1 comes with HBase

Re: EmptyRDD

2014-11-14 Thread Ted Yu
See http://spark.apache.org/docs/0.8.1/api/core/org/apache/spark/rdd/EmptyRDD.html On Nov 14, 2014, at 2:09 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: How to create an empty RDD in Spark? Thank You

Re: Building Spark with hive does not work

2014-11-17 Thread Ted Yu
Looks like this was where you got that commandline: http://search-hadoop.com/m/JW1q5RlPrl Cheers On Mon, Nov 17, 2014 at 9:44 AM, Hao Ren inv...@gmail.com wrote: Sry for spamming, Just after my previous post, I noticed that the command used is: ./sbt/sbt -Phive -Phive-thirftserver clean

Re: Missing SparkSQLCLIDriver and Beeline drivers in Spark

2014-11-17 Thread Ted Yu
Minor correction: there was a typo in commandline hive-thirftserver should be hive-thriftserver Cheers On Thu, Aug 7, 2014 at 6:49 PM, Cheng Lian lian.cs@gmail.com wrote: Things have changed a bit in the master branch, and the SQL programming guide in master branch actually doesn’t apply

Re: Spark with HBase

2014-12-03 Thread Ted Yu
Which hbase release are you running ? If it is 0.98, take a look at: https://issues.apache.org/jira/browse/SPARK-1297 Thanks On Dec 2, 2014, at 10:21 PM, Jai jaidishhari...@gmail.com wrote: I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase Cluster and I am looking for

Re: Spark executor lost

2014-12-03 Thread Ted Yu
bq. to get the logs from the data nodes Minor correction: the logs are collected from machines where node managers run. Cheers On Wed, Dec 3, 2014 at 3:39 PM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: You want to look further up the stack (there are almost certainly other errors

Re: How can I compile only the core and streaming (so that I can get test utilities of streaming)?

2014-12-05 Thread Ted Yu
Please specify '-DskipTests' on commandline. Cheers On Dec 5, 2014, at 3:52 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm currently developing a Spark Streaming application and trying to write my first unit test. I've used Java for this application, and I also need use Java

Re: Adding Spark Cassandra dependency breaks Spark Streaming?

2014-12-05 Thread Ted Yu
Can you try with maven ? diff --git a/streaming/pom.xml b/streaming/pom.xml index b8b8f2e..6cc8102 100644 --- a/streaming/pom.xml +++ b/streaming/pom.xml @@ -68,6 +68,11 @@ artifactIdjunit-interface/artifactId scopetest/scope /dependency +dependency +

Re: How to incrementally compile spark examples using mvn

2014-12-05 Thread Ted Yu
I tried the following: 511 rm -rf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.3.0-SNAPSHOT/ 513 mvn -am -pl streaming package -DskipTests [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [4.976s] [INFO] Spark Project Networking

Re: Where can you get nightly builds?

2014-12-06 Thread Ted Yu
See https://amplab.cs.berkeley.edu/jenkins/view/Spark/ See also https://issues.apache.org/jira/browse/SPARK-1517 Cheers On Sat, Dec 6, 2014 at 6:41 AM, Simone Franzini captainfr...@gmail.com wrote: I recently read in the mailing list that there are now nightly builds available. However, I

Re: run JavaAPISuite with mavem

2014-12-06 Thread Ted Yu
In master branch, I only found JavaAPISuite in comment: spark tyu$ find . -name '*.scala' -exec grep JavaAPISuite {} \; -print * For usage example, see test case JavaAPISuite.testJavaJdbcRDD. * converted into a `Object` array. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.

Re: run JavaAPISuite with mavem

2014-12-06 Thread Ted Yu
Pardon me, the test is here: sql/core/src/test/java/org/apache/spark/sql/api/java/JavaAPISuite.java You can run 'mvn test' under sql/core Cheers On Sat, Dec 6, 2014 at 5:55 PM, Ted Yu yuzhih...@gmail.com wrote: In master branch, I only found JavaAPISuite in comment: spark tyu$ find . -name

Re: run JavaAPISuite with mavem

2014-12-06 Thread Ted Yu
BTW I didn't find JavaAPISuite in test output either. Cheers On Sat, Dec 6, 2014 at 9:12 PM, Koert Kuipers ko...@tresata.com wrote: Ted, i mean core/src/test/java/org/apache/spark/JavaAPISuite.java On Sat, Dec 6, 2014 at 9:27 PM, Ted Yu yuzhih...@gmail.com wrote: Pardon me, the test

Re: run JavaAPISuite with mavem

2014-12-07 Thread Ted Yu
, 2014 at 9:59 PM, Ted Yu yuzhih...@gmail.com wrote: I tried to run tests for core but there were failures. e.g. : ^[[32mExternalAppendOnlyMapSuite:^[[0m ^[[32m- simple insert^[[0m ^[[32m- insert with collision^[[0m ^[[32m- ordering^[[0m ^[[32m- null keys and values^[[0m ^[[32m- simple

Re: NoClassDefFoundError

2014-12-07 Thread Ted Yu
See the following threads: http://search-hadoop.com/m/JW1q5kjNlK http://search-hadoop.com/m/JW1q5XqSDk Cheers On Sun, Dec 7, 2014 at 9:35 AM, Julius K fooliuskool...@gmail.com wrote: Hi everyone, I am new to Spark and encountered a problem. I want to use an external library in a java

Re: run JavaAPISuite with mavem

2014-12-08 Thread Ted Yu
of the official build the java api does not get tested then? i am sure there is a good reason for it, but thats surprising to me. On Sun, Dec 7, 2014 at 12:19 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at the pom.xml, I think I found the reason - scalatest is used. With the following

Re: Apache Spark 1.1.1 with Hbase 0.98.8-hadoop2 and hadoop 2.3.0

2014-12-17 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/JW1q5FS8Mr1 If the problem you encountered is different, please give full stack trace. Cheers On Wed, Dec 17, 2014 at 5:43 AM, Amit Singh Hora hora.a...@gmail.com wrote: Hi All, I have downloaded pre built Spark 1.1.1 for Hadoop 2.3.0

Re: Spark SQL 1.1.1 reading LZO compressed json files

2014-12-17 Thread Ted Yu
See this thread: http://search-hadoop.com/m/JW1q5HAuFv which references https://issues.apache.org/jira/browse/SPARK-2394 Cheers On Wed, Dec 17, 2014 at 8:21 AM, Jerry Lam chiling...@gmail.com wrote: Hi spark users, Do you know how to read json files using Spark SQL that are LZO compressed?

Re: Spark SQL 1.1.1 reading LZO compressed json files

2014-12-17 Thread Ted Yu
if there are some APIs to do that? Best Regards, Jerry On Wed, Dec 17, 2014 at 11:27 AM, Ted Yu yuzhih...@gmail.com wrote: See this thread: http://search-hadoop.com/m/JW1q5HAuFv which references https://issues.apache.org/jira/browse/SPARK-2394 Cheers On Wed, Dec 17, 2014 at 8:21 AM

Re: When will spark 1.2 released?

2014-12-18 Thread Ted Yu
Interesting, the maven artifacts were dated Dec 10th. However vote for RC2 closed recently: http://search-hadoop.com/m/JW1q5K8onk2/Patrick+spark+1.2.0subj=Re+VOTE+Release+Apache+Spark+1+2+0+RC2+ Cheers On Dec 18, 2014, at 10:02 PM, madhu phatak phatak@gmail.com wrote: It’s on Maven

Re: When will spark 1.2 released?

2014-12-19 Thread Ted Yu
coast) or tomorrow at the latest. On Fri, Dec 19, 2014 at 1:09 AM, Ted Yu yuzhih...@gmail.com wrote: Interesting, the maven artifacts were dated Dec 10th. However vote for RC2 closed recently: http://search-hadoop.com/m/JW1q5K8onk2/Patrick+spark+1.2.0subj=Re+VOTE+Release+Apache+Spark+1+2+0+RC2

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Ted Yu
You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 Cheers On Fri, Dec 19, 2014 at 12:51 PM, sa asuka.s...@gmail.com wrote: Can Spark be built with Hadoop 2.6? All I see instructions up to are for 2.4 and there does not seem to be a hadoop2.6 profile. If it works with Hadoop 2.6,

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-19 Thread Ted Yu
Andy: I saw two emails from you from yesterday. See this thread: http://search-hadoop.com/m/JW1q5opRsY1 Cheers On Fri, Dec 19, 2014 at 12:51 PM, Andy Konwinski andykonwin...@gmail.com wrote: Yesterday, I changed the domain name in the mailing list archive settings to remove .incubator so

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Ted Yu
=2.4 which works with Hadoop 2.6. On Fri, Dec 19, 2014 at 12:55 Ted Yu yuzhih...@gmail.com wrote: You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 Cheers On Fri, Dec 19, 2014 at 12:51 PM, sa asuka.s...@gmail.com wrote: Can Spark be built with Hadoop 2.6? All I see instructions

Re: custom python converter from HBase Result to tuple

2014-12-22 Thread Ted Yu
Which HBase version are you using ? Can you show the full stack trace ? Cheers On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, can anyone please give me some help how to write custom converter of hbase data to (for example) tuples of ((family,

Re: How to build Spark against the latest

2014-12-23 Thread Ted Yu
See http://search-hadoop.com/m/JW1q5Cew0j On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I build spark against it,

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Ted Yu
bq. even when testing with the example from the stock hbase_outputformat.py Can you take jstack of the above and pastebin it ? Thanks On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, have been using this without any issues with spark 1.1.0 but after

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Ted Yu
:34, Ted Yu yuzhih...@gmail.com wrote: bq. even when testing with the example from the stock hbase_outputformat.py Can you take jstack of the above and pastebin it ? Thanks On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, have been using

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Ted Yu
etc but then it just hangs. very same code runs ok on spark 1.1.0 - the records gets stored in hbase. thanks, Antony. On Thursday, 25 December 2014, 0:37, Ted Yu yuzhih...@gmail.com wrote: I went over the jstack but didn't find any call related to hbase or zookeeper. Do you find

Re: How to build Spark against the latest

2014-12-26 Thread Ted Yu
-- Original -- *From: * Ted Yu;yuzhih...@gmail.com; *Send time:* Wednesday, Dec 24, 2014 12:09 PM *To:* guxiaobo1...@qq.com; *Cc:* user@spark.apache.orguser@spark.apache.org; *Subject: * Re: How to build Spark against the latest See http://search

Re: How to build Spark against the latest

2014-12-27 Thread Ted Yu
to skip java tests? I build the distro just fine with Java 8. On Dec 27, 2014 4:21 AM, Ted Yu yuzhih...@gmail.com wrote: In case jdk 1.7 or higher is used to build, --skip-java-test needs to be specifed. FYI On Thu, Dec 25, 2014 at 5:03 PM, guxiaobo1982 guxiaobo1...@qq.com wrote

Re: Compile error since Spark 1.2.0

2014-12-27 Thread Ted Yu
Please see: [SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and some optimizations Cheers On Sat, Dec 27, 2014 at 7:20 PM, zigen dbviewer.zi...@gmail.com wrote: Compile error from Spark 1.2.0 Hello , I am zigen. I am using the Spark SQL 1.1.0. I want to use the Spark

Re: Spark 1.2.0 Yarn not published

2014-12-28 Thread Ted Yu
See this thread: http://search-hadoop.com/m/JW1q5vd61V1/Spark-yarn+1.2.0subj=Re+spark+yarn_2+10+1+2+0+artifacts Cheers On Dec 28, 2014, at 11:13 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Hi all I just realized that spark-yarn artifact hasn't been published for 1.2.0

Re: Building Spark 1.2 jmx and jmxtools issue?

2014-12-29 Thread Ted Yu
I got same error when specifying -Pmapr4. For the following command: sbt/sbt -Pyarn -Phive -Phive-thriftserver assembly I got: how can getCommonSuperclass() do its job if different class symbols get the same bytecode-level internal name: org/apache/spark/sql/catalyst/dsl/package$ScalaUdfBuilder

Re: Why the major.minor version of the new hive-exec is 51.0?

2014-12-31 Thread Ted Yu
of things. As a result I'd file the JIRA against Spark. On Wed, Dec 31, 2014 at 12:55 PM, Ted Yu yuzhih...@gmail.com wrote: Michael: hive-exec-0.12.0-protobuf-2.5.jar is not generated from Spark source code, right ? What would be done after the JIRA is opened ? Cheers On Wed, Dec 31, 2014

Re: (send this email to subscribe)

2015-01-02 Thread Ted Yu
There is no need to include user@spark.apache.org in subscription request. FYI On Fri, Jan 2, 2015 at 7:36 AM, Pankaj pankajnaran...@gmail.com wrote:

Re: How to convert String data to RDD.

2015-01-02 Thread Ted Yu
Please see http://search-hadoop.com/m/JW1q53L9PJ On Fri, Jan 2, 2015 at 4:31 PM, RP hadoo...@outlook.com wrote: Hello Guys, Spark noob here. I am trying to create RDD from JSON data fetched from URL parsing. My URL parsing function gives me JSON in string format. How do I convert JSON

Re: different akka versions and spark

2015-01-02 Thread Ted Yu
Please see http://akka.io/news/2014/05/22/akka-2.3.3-released.html which points to http://doc.akka.io/docs/akka/2.3.3/project/migration-guide-2.2.x-2.3.x.html?_ga=1.35212129.1385865413.1420220234 Cheers On Fri, Jan 2, 2015 at 9:11 AM, Koert Kuipers ko...@tresata.com wrote: i noticed spark

Re: Where can I find logs set inside RDD processing functions?

2015-02-06 Thread Ted Yu
To add to What Petar said, when YARN log aggregation is enabled, consider specifying yarn.nodemanager.remote-app-log-dir which is where aggregated logs are saved. Cheers On Fri, Feb 6, 2015 at 12:36 PM, Petar Zecevic petar.zece...@gmail.com wrote: You can enable YARN log aggregation

Re: Can't access remote Hive table from spark

2015-02-07 Thread Ted Yu
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xiaobogu, access=WRITE, inode=/user:hdfs:hdfs:drwxr-xr-x Looks like permission issue. Can you give access to 'xiaobogu' ? Cheers On Sat, Feb 7, 2015 at 8:15 AM,

Re: ephemeral-hdfs vs persistent-hdfs - performance

2015-02-03 Thread Ted Yu
Using s3a protocol (introduced in hadoop 2.6.0) would be faster compared to s3. The upcoming hadoop 2.7.0 contains some bug fixes for s3a. FYI On Tue, Feb 3, 2015 at 9:48 AM, David Rosenstrauch dar...@darose.net wrote: We use S3 as a main storage for all our input data and our generated

Re: maven doesn't build dependencies with Scala 2.11

2015-02-05 Thread Ted Yu
Now that Kafka 0.8.2.0 has been released, adding external/kafka module works. FYI On Sun, Jan 18, 2015 at 7:36 PM, Ted Yu yuzhih...@gmail.com wrote: bq. there was no 2.11 Kafka available That's right. Adding external/kafka module resulted in: [ERROR] Failed to execute goal on project spark

Re: How do I set spark.local.dirs?

2015-02-06 Thread Ted Yu
Can you try setting SPARK_LOCAL_DIRS in spark-env.sh ? Cheers On Fri, Feb 6, 2015 at 7:30 AM, Joe Wass jw...@crossref.org wrote: I'm running on EC2 and I want to set the directory to use on the slaves (mounted EBS volumes). I have set: spark.local.dir /vol3/my-spark-dir in

Re: Is spark suitable for large scale pagerank, such as 200 million nodes, 2 billion edges?

2015-01-15 Thread Ted Yu
Have you seen http://search-hadoop.com/m/JW1q5pE3P12 ? Please also take a look at the end-to-end performance graph on http://spark.apache.org/graphx/ Cheers On Thu, Jan 15, 2015 at 9:29 AM, txw t...@outlook.com wrote: Hi, I am run PageRank on a large dataset, which include 200 million

Re: enable debug-level log output of akka?

2015-01-14 Thread Ted Yu
I assume you have looked at: http://doc.akka.io/docs/akka/2.0/scala/logging.html http://doc.akka.io/docs/akka/current/additional/faq.html (Debugging, last question) Cheers On Wed, Jan 14, 2015 at 2:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all though

  1   2   3   4   5   6   7   8   9   10   >