Spark Streaming not working in YARN mode

2014-11-19 Thread kam lee
I created a simple Spark Streaming program - it received numbers and computed averages and sent the results to Kafka. It worked perfectly in local mode as well as standalone master/slave mode across a two-node cluster. It did not work however in yarn-client or yarn-cluster mode. The job was

Re: Spark + Tableau

2014-10-30 Thread Denny Lee
When you are starting the thrift server service - are you connecting to it locally or is this on a remote server when you use beeline and/or Tableau? On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic blood9ra...@gmail.com wrote: I use beta driver SQL ODBC from Databricks. -- View this message

Re: winutils

2014-10-29 Thread Denny Lee
QQ - did you download the Spark 1.1 binaries that included the Hadoop one? Does this happen if you're using the Spark 1.1 binaries that do not include the Hadoop jars? On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub ronalday...@live.com wrote: Apparently Spark does require Hadoop even if you do not

Re: add external jars to spark-shell

2014-10-20 Thread Denny Lee
–jar (ADD_JARS) is a special class loading for Spark while –driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and appended to classpath settings that is used to start the JVM running the driver You can reference https://www.concur.com/blog/en-us/connect-tableau-to-sparksql

Re: Spark SQL - custom aggregation function (UDAF)

2014-10-14 Thread Pei-Lun Lee
I created https://issues.apache.org/jira/browse/SPARK-3947 On Tue, Oct 14, 2014 at 3:54 AM, Michael Armbrust mich...@databricks.com wrote: Its not on the roadmap for 1.2. I'd suggest opening a JIRA. On Mon, Oct 13, 2014 at 4:28 AM, Pierre B pierre.borckm...@realimpactanalytics.com wrote:

Re: spark sql union all is slow

2014-10-14 Thread Pei-Lun Lee
Hi, You can merge them into one table by: sqlContext.unionAll(sqlContext.unionAll(sqlContext.table(table_1), sqlContext.table(table_2)), sqlContext.table(table_3)).registarTempTable(table_all) Or load them in one call by:

Re: Interactive interface tool for spark

2014-10-08 Thread moon soo Lee
Hi, Please check Zeppelin, too. http://zeppelin-project.org https://github.com/nflabs/zeppelin Which is similar to scala notebook. Best, moon 2014년 10월 9일 목요일, andy petrellaandy.petre...@gmail.com님이 작성한 메시지: Sure! I'll post updates as well in the ML :-) I'm doing it on twitter for now

Re: REPL like interface for Spark

2014-09-29 Thread moon soo Lee
Hi, There is project called Zeppelin. You can checkout here https://github.com/NFLabs/zeppelin Homepage is here. http://zeppelin-project.org/ It's notebook style tool (like databrics demo, scala-notebook) with nice UI, with built-in Spark integration. It's in active development, so don't

Re: REPL like interface for Spark

2014-09-29 Thread moon soo Lee
at 10:48 AM, moon soo Lee leemoon...@gmail.com wrote: Hi, There is project called Zeppelin. You can checkout here https://github.com/NFLabs/zeppelin Homepage is here. http://zeppelin-project.org/ It's notebook style tool (like databrics demo, scala-notebook) with nice UI, with built

Re: Spark Hive max key length is 767 bytes

2014-09-25 Thread Denny Lee
by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Should I use HIVE 0.12.0 instead of HIVE 0.13.1? Regards Arthur On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g

Re: What is a pre built package of Apache Spark

2014-09-24 Thread Denny Lee
This seems similar to a related Windows issue concerning python where pyspark could't find the python because the PYTHONSTARTUP environment wasn't set - by any chance could this be related? On Wed, Sep 24, 2014 at 7:51 PM, christy 760948...@qq.com wrote: Hi I have installed standalone on

Re: SchemaRDD and RegisterAsTable

2014-09-17 Thread Denny Lee
The registered table is stored within the spark context itself.  To have the table available for the thrift server to get access to, you can save the sc table into the Hive context so that way the Thrift server process can see the table.  If you are using derby as your metastore, then the

SparkContext and multi threads

2014-09-11 Thread moon soo Lee
Hi, I'm trying to make spark work on multithreads java application. What i'm trying to do is, - Create a Single SparkContext - Create Multiple SparkILoop and SparkIMain - Inject created SparkContext into SparkIMain interpreter. Thread is created by every user request and take a SparkILoop and

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
I’m not sure if I’m completely answering your question here but I’m currently working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 without any issues. On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote: I see the binary packages include hadoop 1,

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Denny Lee
registerTempTable you mentioned works on SqlContext instead of HiveContext. Thanks, Du On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote: Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
, but in Spark 1.1.0, there are separate packages for hadoop 2.3 and 2.4. That implies some difference in Spark according to hadoop version.   From:Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 9:35 AM To: user@spark.apache.org; Haopu Wang; d...@spark.apache.org

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
to read from HDFS, you’ll need to build Spark against the specific HDFS version in your environment.”   Did you try to read a hadoop 2.5.0 file using Spark 1.1 with hadoop 2.4?   Thanks!   From:Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 10:00 AM To: Patrick

Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee
When you re-ran sbt did you clear out the packages first and ensure that the datanucleus jars were generated within lib_managed? I remembered having to do that when I was working testing out different configs. On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 alexandria.shea...@gmail.com wrote:

Re: Spark SQL Thrift JDBC server deployment for production

2014-09-11 Thread Denny Lee
Could you provide some context about running this in yarn-cluster mode? The Thrift server that's included within Spark 1.1 is based on Hive 0.12. Hive has been able to work against YARN since Hive 0.10. So when you start the thrift server, provided you copied the hive-site.xml over to the Spark

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread Denny Lee
Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed to RegisterAsTempTable to better reflect that. The Thrift server process runs under a different process meaning that it cannot see any of the

Re: Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-04 Thread Denny Lee
your-port This behavior is inherited from Hive since Spark SQL Thrift server is a variant of HiveServer2. ​ On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee denny.g@gmail.com wrote: When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077

Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-03 Thread Denny Lee
When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path $CLASSPATH It appears that the thrift server is starting off of localhost as opposed to hostname.  I have set the spark-env.sh to use the hostname, modified the

Spark driver application can not connect to Spark-Master

2014-09-01 Thread moon soo Lee
Hi, I'm developing an application with Spark. My java application trying to creates spark context like Creating spark context public SparkContext createSparkContext(){ String execUri = System.getenv(SPARK_EXECUTOR_URI); String[] jars = SparkILoop.getAddedJars();

Re: SparkSQL HiveContext No Suitable Driver / Cannot Find Driver

2014-08-30 Thread Denny Lee
Oh, forgot to add the managed libraries and the Hive libraries within the CLASSPATH.  As soon as I did that, we’re good to go now. On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote: My issue is similar to the issue as noted  http://mail-archives.apache.org/mod_mbox

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread Denny Lee
Oh, you may be running into an issue with your MySQL setup actually, try running alter database metastore_db character set latin1 so that way Hive (and the Spark HiveContext) can execute properly against the metastore. On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com

Spark / Thrift / ODBC connectivity

2014-08-28 Thread Denny Lee
I’m currently using the Spark 1.1 branch and have been able to get the Thrift service up and running.  The quick questions were whether I should able to use the Thrift service to connect to SparkSQL generated tables and/or Hive tables?   As well, by any chance do we have any documents that

RE: Hive From Spark

2014-08-25 Thread Andrew Lee
Lee alee...@hotmail.com wrote: Hopefully there could be some progress on SPARK-2420. It looks like shading may be the voted solution among downgrading. Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark 1.1.2? By the way, regarding bin/spark-sql? Is this more

LDA example?

2014-08-21 Thread Denny Lee
Quick question - is there a handy sample / example of how to use the LDA algorithm within Spark MLLib?   Thanks! Denny

Spark-job error on writing result into hadoop w/ switch_user=false

2014-08-20 Thread Jongyoul Lee
Hi, I've used hdfs 2.3.0-cdh5.0.1, mesos 0.19.1 and spark 1.0.2 that is re-compiled. For a security reason, we run hdfs and mesos as hdfs, that is an account name and not in a root group, and non-root user submit a spark job on mesos. With no-switch_user, simple job, which only read data from

How to configure SPARK_EXECUTOR_URI to access files from maprfs

2014-08-19 Thread Lee Strawther (lstrawth)
We use MapR Hadoop and I have configured mesos-0.18.1 and spark-1.0.1 to work together on top of the nodes running mapr hadoop. I would like to configure spark to access files from the mapr filesystem (maprfs://) and I'm starting with configuring the SPARK_EXECUTOR_URI environment variable in

Re: Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-15 Thread Denny Lee
Apologies but we had placed the settings for downloading the slides to Seattle Spark Meetup members only - but actually meant to share with everyone.  We have since fixed this and now you can download it.  HTH! On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote

Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-14 Thread Denny Lee
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay - Troubleshooting the Everyday Issues, the slides have been now posted at:  http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf. Enjoy! Denny

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
/user/hive/warehouse) On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee lt; alee526@ gt; wrote: Hi All, It has been awhile, but what I did to make it work is to make sure the followings: 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2 2. Make sure you have

Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to use the $USER env variable. For example, I'm running the command with user 'test'. In

RE: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
2014-07-28 12:40 GMT-07:00 Andrew Lee alee...@hotmail.com: Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to use the $USER env variable

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Andrew Lee
files explicitly to --jars option and it worked fine. The Caused by... messages were found in yarn logs actually, I think it might be useful if I can seem them from the console which runs spark-submit. Would that be possible? Jianshi On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee alee

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Andrew Lee
Hi Jianshi, Could you provide which HBase version you're using? By the way, a quick sanity check on whether the Workers can access HBase? Were you able to manually write one record to HBase with the serialize function? Hardcode and test it ? From: jianshi.hu...@gmail.com Date: Fri, 25 Jul 2014

akka 2.3.x?

2014-07-23 Thread Lee Mighdoll
-cassandra-connector rather than the hadoop back end? Cheers, Lee

RE: Hive From Spark

2014-07-22 Thread Andrew Lee
for Hive-on-Spark now. On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee alee...@hotmail.com wrote: Hive and Hadoop are using an older version of guava libraries (11.0.1) where Spark Hive is using guava 14.0.1+. The community isn't willing to downgrade to 11.0.1 which is the current version

Re: spark sql left join gives KryoException: Buffer overflow

2014-07-21 Thread Pei-Lun Lee
: Unfortunately, this is a query where we just don't have an efficiently implementation yet. You might try switching the table order. Here is the JIRA for doing something more efficient: https://issues.apache.org/jira/browse/SPARK-2212 On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee pl

RE: Hive From Spark

2014-07-21 Thread Andrew Lee
Hi All, Currently, if you are running Spark HiveContext API with Hive 0.12, it won't work due to the following 2 libraries which are not consistent with Hive 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common practice, they should be consistent to work inter-operable).

spark sql left join gives KryoException: Buffer overflow

2014-07-18 Thread Pei-Lun Lee
: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 1 Looks like spark sql tried to do a broadcast join and collecting one of the table to master but it is too large. How do we explicitly control the join behavior like this? -- Pei-Lun Lee

SeattleSparkMeetup: Spark at eBay - Troubleshooting the everyday issues

2014-07-18 Thread Denny Lee
We're coming off a great Seattle Spark Meetup session with Evan Chan (@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra  (http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.  Now, we're proud to announce that our next session is Spark at eBay -

Re: Spark SQL 1.0.1 error on reading fixed length byte array

2014-07-15 Thread Pei-Lun Lee
, but there is a PR open to fix it: https://issues.apache.org/jira/browse/SPARK-2446 On Mon, Jul 14, 2014 at 4:17 AM, Pei-Lun Lee pl...@appier.com wrote: Hi, I am using spark-sql 1.0.1 to load parquet files generated from method described in: https://gist.github.com/massie/7224868 When I

Re: Spark SQL 1.0.1 error on reading fixed length byte array

2014-07-15 Thread Pei-Lun Lee
Filed SPARK-2446 2014-07-15 16:17 GMT+08:00 Michael Armbrust mich...@databricks.com: Oh, maybe not. Please file another JIRA. On Tue, Jul 15, 2014 at 12:34 AM, Pei-Lun Lee pl...@appier.com wrote: Hi Michael, Good to know it is being handled. I tried master branch (9fe693b5) and got

Re: Spark SQL 1.0.1 error on reading fixed length byte array

2014-07-15 Thread Pei-Lun Lee
Sorry, should be SPARK-2489 2014-07-15 19:22 GMT+08:00 Pei-Lun Lee pl...@appier.com: Filed SPARK-2446 2014-07-15 16:17 GMT+08:00 Michael Armbrust mich...@databricks.com: Oh, maybe not. Please file another JIRA. On Tue, Jul 15, 2014 at 12:34 AM, Pei-Lun Lee pl...@appier.com wrote

Spark SQL 1.0.1 error on reading fixed length byte array

2014-07-14 Thread Pei-Lun Lee
Hi, I am using spark-sql 1.0.1 to load parquet files generated from method described in: https://gist.github.com/massie/7224868 When I try to submit a select query with columns of type fixed length byte array, the following error pops up: 14/07/14 11:09:14 INFO scheduler.DAGScheduler: Failed

RE: SPARK_CLASSPATH Warning

2014-07-11 Thread Andrew Lee
As mentioned, deprecated in Spark 1.0+. Try to use the --driver-class-path: ./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar Don't use glob *, specify the JAR one by one with colon. Date: Wed, 9 Jul 2014 13:45:07 -0700 From: kat...@cs.pitt.edu Subject: SPARK_CLASSPATH Warning

RE: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-11 Thread Andrew Lee
Ok, I found it on JIRA SPARK-2390: https://issues.apache.org/jira/browse/SPARK-2390 So it looks like this is a known issue. From: alee...@hotmail.com To: user@spark.apache.org Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option? Date: Tue, 8 Jul 2014 15:17:00

spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-08 Thread Andrew Lee
Build: Spark 1.0.0 rc11 (git commit tag: 2f1dc868e5714882cf40d2633fb66772baf34789) Hi All, When I enabled the spark-defaults.conf for the eventLog, spark-shell broke while spark-submit works. I'm trying to create a separate directory per user to keep track with their own Spark job event

RE: Spark logging strategy on YARN

2014-07-07 Thread Andrew Lee
Hi Kudryavtsev, Here's what I am doing as a common practice and reference, I don't want to say it is best practice since it requires a lot of customer experience and feedback, but from a development and operating stand point, it will be great to separate the YARN container logs with the Spark

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
Thanks! will take a look at this later today. HTH! On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Denny, just created https://issues.apache.org/jira/browse/SPARK-2356 On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote:

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com

RE: write event logs with YARN

2014-07-02 Thread Andrew Lee
Hi Christophe, Make sure you have 3 slashes in the hdfs scheme. e.g. hdfs:///server_name:9000/user/user_name/spark-events and in the spark-defaults.conf as well.spark.eventLog.dir=hdfs:///server_name:9000/user/user_name/spark-events Date: Thu, 19 Jun 2014 11:18:51 +0200 From:

Re: LiveListenerBus throws exception and weird web UI bug

2014-06-26 Thread Pei-Lun Lee
submitted. Don’t know if that can help. On Jun 26, 2014, at 6:41 AM, Pei-Lun Lee pl...@appier.com wrote: Hi, We have a long running spark application runs on spark 1.0 standalone server and after it runs several hours the following exception shows up: 14/06/25 23:13:08 ERROR

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-23 Thread Andrew Lee
I checked the source code, it looks like it was re-added back based on JIRA SPARK-1588, but I don't know if there's any test case associated with this? SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN. Sandy Ryza sa...@cloudera.com 2014-04-29 12:54:02 -0700

Re: Enormous EC2 price jump makes r3.large patch more important

2014-06-18 Thread Jeremy Lee
, 2014 at 9:29 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: I am about to spin up some new clusters, so I may give that a go... any special instructions for making them work? I assume I use the --spark-git-repo= option on the spark-ec2 command. Is it as easy as concatenating your

HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Hi All, Have anyone ran into the same problem? By looking at the source code in official release (rc11),this property settings is set to false by default, however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it to fill up the disk pretty fast since SparkContext deploys

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Forgot to mention that I am using spark-submit to submit jobs, and a verbose mode print out looks like this with the SparkPi examples.The .sparkStaging won't be deleted. My thoughts is that this should be part of the staging and should be cleaned up as well when sc gets terminated.

Enormous EC2 price jump makes r3.large patch more important

2014-06-17 Thread Jeremy Lee
on that issue. Let me know if I can help with testing and whatnot. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Enormous EC2 price jump makes r3.large patch more important

2014-06-17 Thread Jeremy Lee
a 1.0.1 release soon (this patch being one of the main reasons), but if you are itching for this sooner, you can just checkout the head of branch-1.0 and you will be able to use r3.XXX instances. - Patrick On Tue, Jun 17, 2014 at 4:17 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote

Re: Spark SQL incorrect result on GROUP BY query

2014-06-12 Thread Pei-Lun Lee
-Lun Lee pl...@appier.com wrote: Hi, I am using spark 1.0.0 and found in spark sql some queries use GROUP BY give weird results. To reproduce, type the following commands in spark-shell connecting to a standalone server: case class Foo(k: String, v: Int) val sqlContext = new

Spark SQL incorrect result on GROUP BY query

2014-06-11 Thread Pei-Lun Lee
], [c,270], [4,56], [1,1]) and if I run the same query again, the new result will be correct: sql(select k,count(*) from foo group by k).collect res2: Array[org.apache.spark.sql.Row] = Array([b,200], [a,100], [c,300]) Should I file a bug? -- Pei-Lun Lee

Re: Best practise for 'Streaming' dumps?

2014-06-08 Thread Jeremy Lee
I read it more carefully, and window() might actually work for some other stuff like logs. (assuming I can have multiple windows with entirely different attributes on a single stream..) Thanks for that! On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Yes

Are scala.MatchError messages a problem?

2014-06-08 Thread Jeremy Lee
I shut down my first (working) cluster and brought up a fresh one... and It's been a bit of a horror and I need to sleep now. Should I be worried about these errors? Or did I just have the old log4j.config tuned so I didn't see them? I 14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Jeremy Lee
of learning maven, if it means I never have to use sbt again. Does it mean that? -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: New user streaming question

2014-06-06 Thread Jeremy Lee
and the StreamingContext uses the network to read words, but as I said, nothing comes out. I tried changing the .print() to .saveAsTextFiles(), and I AM getting a file, but nothing is in it other than a _temporary subdir. I'm sure I'm confused here, but not sure where. Help? -- Jeremy Lee

Best practise for 'Streaming' dumps?

2014-06-06 Thread Jeremy Lee
persistent data for a streaming app? (Across restarts) And to clean up on termination? -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Can't seem to link external/twitter classes from my own app

2014-06-05 Thread Jeremy Lee
, 2014 at 5:46 PM, Nick Pentreath nick.pentre...@gmail.com wrote: Great - well we do hope we hear from you, since the user list is for interesting success stories and anecdotes, as well as blog posts etc too :) On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote

Twitter feed options?

2014-06-05 Thread Jeremy Lee
! -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Twitter feed options?

2014-06-05 Thread Jeremy Lee
Nope, sorry, nevermind! I looked at the source, and it was pretty obvious that it didn't implement that yet, so I've ripped the classes out and am mutating them into a new receivers right now... ... starting to get the hang of this. On Fri, Jun 6, 2014 at 1:07 PM, Jeremy Lee unorthodox.engine

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-04 Thread Jeremy Lee
, I'm sure I'll get there. But I do understand the implications of a mixed functional-imperative language with closures and lambdas. That is serious voodoo. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Can't seem to link external/twitter classes from my own app

2014-06-04 Thread Jeremy Lee
://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22 The name is spark-streaming-twitter_2.10 On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Man, this has been hard going. Six days, and I finally got a Hello World App

Re: Why Scala?

2014-06-04 Thread Jeremy Lee
http://nabble.com/. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Can't seem to link external/twitter classes from my own app

2014-06-04 Thread Jeremy Lee
if creating Uberjars takes this long every... single... time... On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Thanks Patrick! Uberjars. Cool. I'd actually heard of them. And thanks for the link to the example! I shall work through that today. I'm still learning sbt

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-03 Thread Jeremy Lee
/SPARK-1990 to track this. Matei On Jun 1, 2014, at 6:14 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Sort of.. there were two separate issues, but both related to AWS.. I've sorted the confusion about the Master/Worker AMI ... use the version chosen by the scripts. (and use

Re: Spark on EC2

2014-06-01 Thread Jeremy Lee
.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-01 Thread Jeremy Lee
Lee BCompSci(Hons) The Unorthodox Engineers

Re: Trouble with EC2

2014-06-01 Thread Jeremy Lee
/10.100.75.70:38485 -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-01 Thread Jeremy Lee
-a that allows you to give a specific AMI. This flag is just an internal tool that we use for testing when we spin new AMI's. Users can't set that to an arbitrary AMI because we tightly control things like the Java and OS versions, libraries, etc. On Sun, Jun 1, 2014 at 12:51 AM, Jeremy Lee

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
12.04 AMI... that might be a good place to start. But if there is a straightforward way to make them compatible with 2.6 we should do that. For r3.large, we can add that to the script. It's a newer type. Any interest in contributing this? - Patrick On May 30, 2014 5:08 AM, Jeremy Lee

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
to bite the bullet and start building my own AMI's from scratch... if anyone can save me from that, I'd be most grateful. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Seattle Spark Meetup: xPatterns Slides and @pacoid session next week!

2014-05-23 Thread Denny Lee
For those whom were not able to attend the last Seattle Spark Meetup, we had a great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and Mesos - you can find the slides at: http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014. As well, check out the next

Is spark 1.0.0 spark-shell --master=yarn running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
Does anyone know if: ./bin/spark-shell --master yarn is running yarn-cluster or yarn-client by default? Base on source code: ./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala if (args.deployMode == cluster args.master.startsWith(yarn)) { args.master = yarn-cluster

RE: Is spark 1.0.0 spark-shell --master=yarn running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
: if (args.deployMode != cluster args.master.startsWith(yarn)) { args.master = yarn-client} 2014-05-21 10:57 GMT-07:00 Andrew Lee alee...@hotmail.com: Does anyone know if: ./bin/spark-shell --master yarn is running yarn-cluster or yarn-client by default? Base on source code: ./core/src

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Andrew Lee
- (512) 286-6075 Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into account, I'm actually thinking about using a separate subnet to From: Andrew Lee alee...@hotmail.com To: user@spark.apache.org user@spark.apache.org Date: 05/04/2014 09:57 PM Subject

RE: run spark0.9.1 on yarn with hadoop CDH4

2014-05-06 Thread Andrew Lee
Please check JAVA_HOME. Usually it should point to /usr/java/default on CentOS/Linux. or FYI: http://stackoverflow.com/questions/1117398/java-home-directory Date: Tue, 6 May 2014 00:23:02 -0700 From: sln-1...@163.com To: u...@spark.incubator.apache.org Subject: run spark0.9.1 on yarn with

Spark 0.9.1 - saveAsSequenceFile and large RDD

2014-05-05 Thread Allen Lee
pairs //set parallelism to 1 to keep the file from being partitioned sc.makeRDD(kv,1) .saveAsSequenceFile(path) Does anyone have any pointers on how to get past this? Thanks, -- *Allen Lee* Software Engineer MediaCrossing Inc.

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
.nabble.com/Securing-Spark-s-Network-tp4832p4984.html [2] http://en.wikipedia.org/wiki/Ephemeral_port [3] http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512) 286-6075 Andrew Lee ---05/02/2014

spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
Hi All, I encountered this problem when the firewall is enabled between the spark-shell and the Workers. When I launch spark-shell in yarn-client mode, I notice that Workers on the YARN containers are trying to talk to the driver (spark-shell), however, the firewall is not opened and caused

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
-0400 Subject: Re: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication From: yana.kadiy...@gmail.com To: user@spark.apache.org I think what you want to do is set spark.driver.port to a fixed port. On Fri, May 2, 2014 at 1:52 PM, Andrew Lee alee...@hotmail.com

Seattle Spark Meetup Slides

2014-05-02 Thread Denny Lee
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here are the links to the various slides: Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei Zaharia and Pat McDonough Learnings from Running Spark at Twitter sessions Ben Hindman’s Mesos

Re: Spark Training

2014-05-01 Thread Denny Lee
You may also want to check out Paco Nathan's Introduction to Spark courses: http://liber118.com/pxn/ On May 1, 2014, at 8:20 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: Hi Nicholas, We provide training on spark, hands-on also associated ecosystem. We gave it recently at a

CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera Manager, Spark is running healthy. But when I try to run spark-shell, I eventually get the error: 14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master  spark://ip-172-xxx-xxx-xxx:7077... 14/04/02

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Denny Lee
If you have any questions on helping to get a Spark Meetup off the ground, please do not hesitate to ping me (denny.g@gmail.com).  I helped jump start the one here in Seattle (and tangentially have been helping the Vancouver and Denver ones as well).  HTH! On March 31, 2014 at 12:35:38

Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/ You need to build Spark with 'sbt/sbt assembly' before running this program. After digging into the

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
to the jar it self so need for random class paths. On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee alee...@hotmail.com wrote: Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly

<    1   2   3