Re: What is the range of the PageRank value of graphx

2023-03-28 Thread lee
| | To | lee | | Cc | user@spark.apache.org | | Subject | Re: What is the range of the PageRank value of graphx | From the docs: * Note that this is not the "normalized" PageRank and as a consequence pages that have no * inlinks will have a PageRank of alpha. In particular, the pageranks may

What is the range of the PageRank value of graphx

2023-03-28 Thread lee
When I calculate pagerank using HugeGraph, each pagerank value is less than 1, and the total of pageranks is 1. However, the PageRank value of graphx is often greater than 1, so what is the range of the PageRank value of graphx? || 李杰 | | leedd1...@163.com |

Unsubscribe

2023-06-29 Thread lee
Unsubscribe | | 李杰 | | leedd1...@163.com |

Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/ You need to build Spark with 'sbt/sbt assembly' before running this program. After digging into the

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
to the jar it self so need for random class paths. On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee alee...@hotmail.com wrote: Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Denny Lee
If you have any questions on helping to get a Spark Meetup off the ground, please do not hesitate to ping me (denny.g@gmail.com).  I helped jump start the one here in Seattle (and tangentially have been helping the Vancouver and Denver ones as well).  HTH! On March 31, 2014 at 12:35:38

CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera Manager, Spark is running healthy. But when I try to run spark-shell, I eventually get the error: 14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master  spark://ip-172-xxx-xxx-xxx:7077... 14/04/02

Re: Spark Training

2014-05-01 Thread Denny Lee
You may also want to check out Paco Nathan's Introduction to Spark courses: http://liber118.com/pxn/ On May 1, 2014, at 8:20 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: Hi Nicholas, We provide training on spark, hands-on also associated ecosystem. We gave it recently at a

spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
Hi All, I encountered this problem when the firewall is enabled between the spark-shell and the Workers. When I launch spark-shell in yarn-client mode, I notice that Workers on the YARN containers are trying to talk to the driver (spark-shell), however, the firewall is not opened and caused

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
-0400 Subject: Re: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication From: yana.kadiy...@gmail.com To: user@spark.apache.org I think what you want to do is set spark.driver.port to a fixed port. On Fri, May 2, 2014 at 1:52 PM, Andrew Lee alee...@hotmail.com

Seattle Spark Meetup Slides

2014-05-02 Thread Denny Lee
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here are the links to the various slides: Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei Zaharia and Pat McDonough Learnings from Running Spark at Twitter sessions Ben Hindman’s Mesos

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
.nabble.com/Securing-Spark-s-Network-tp4832p4984.html [2] http://en.wikipedia.org/wiki/Ephemeral_port [3] http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512) 286-6075 Andrew Lee ---05/02/2014

Spark 0.9.1 - saveAsSequenceFile and large RDD

2014-05-05 Thread Allen Lee
pairs //set parallelism to 1 to keep the file from being partitioned sc.makeRDD(kv,1) .saveAsSequenceFile(path) Does anyone have any pointers on how to get past this? Thanks, -- *Allen Lee* Software Engineer MediaCrossing Inc.

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Andrew Lee
- (512) 286-6075 Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into account, I'm actually thinking about using a separate subnet to From: Andrew Lee alee...@hotmail.com To: user@spark.apache.org user@spark.apache.org Date: 05/04/2014 09:57 PM Subject

RE: run spark0.9.1 on yarn with hadoop CDH4

2014-05-06 Thread Andrew Lee
Please check JAVA_HOME. Usually it should point to /usr/java/default on CentOS/Linux. or FYI: http://stackoverflow.com/questions/1117398/java-home-directory Date: Tue, 6 May 2014 00:23:02 -0700 From: sln-1...@163.com To: u...@spark.incubator.apache.org Subject: run spark0.9.1 on yarn with

Is spark 1.0.0 spark-shell --master=yarn running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
Does anyone know if: ./bin/spark-shell --master yarn is running yarn-cluster or yarn-client by default? Base on source code: ./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala if (args.deployMode == cluster args.master.startsWith(yarn)) { args.master = yarn-cluster

RE: Is spark 1.0.0 spark-shell --master=yarn running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
: if (args.deployMode != cluster args.master.startsWith(yarn)) { args.master = yarn-client} 2014-05-21 10:57 GMT-07:00 Andrew Lee alee...@hotmail.com: Does anyone know if: ./bin/spark-shell --master yarn is running yarn-cluster or yarn-client by default? Base on source code: ./core/src

Seattle Spark Meetup: xPatterns Slides and @pacoid session next week!

2014-05-23 Thread Denny Lee
For those whom were not able to attend the last Seattle Spark Meetup, we had a great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and Mesos - you can find the slides at: http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014. As well, check out the next

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
12.04 AMI... that might be a good place to start. But if there is a straightforward way to make them compatible with 2.6 we should do that. For r3.large, we can add that to the script. It's a newer type. Any interest in contributing this? - Patrick On May 30, 2014 5:08 AM, Jeremy Lee

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
to bite the bullet and start building my own AMI's from scratch... if anyone can save me from that, I'd be most grateful. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Spark on EC2

2014-06-01 Thread Jeremy Lee
.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-01 Thread Jeremy Lee
Lee BCompSci(Hons) The Unorthodox Engineers

Re: Trouble with EC2

2014-06-01 Thread Jeremy Lee
/10.100.75.70:38485 -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-01 Thread Jeremy Lee
-a that allows you to give a specific AMI. This flag is just an internal tool that we use for testing when we spin new AMI's. Users can't set that to an arbitrary AMI because we tightly control things like the Java and OS versions, libraries, etc. On Sun, Jun 1, 2014 at 12:51 AM, Jeremy Lee

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-03 Thread Jeremy Lee
/SPARK-1990 to track this. Matei On Jun 1, 2014, at 6:14 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Sort of.. there were two separate issues, but both related to AWS.. I've sorted the confusion about the Master/Worker AMI ... use the version chosen by the scripts. (and use

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-04 Thread Jeremy Lee
, I'm sure I'll get there. But I do understand the implications of a mixed functional-imperative language with closures and lambdas. That is serious voodoo. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Can't seem to link external/twitter classes from my own app

2014-06-04 Thread Jeremy Lee
://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22 The name is spark-streaming-twitter_2.10 On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Man, this has been hard going. Six days, and I finally got a Hello World App

Re: Why Scala?

2014-06-04 Thread Jeremy Lee
http://nabble.com/. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Can't seem to link external/twitter classes from my own app

2014-06-04 Thread Jeremy Lee
if creating Uberjars takes this long every... single... time... On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Thanks Patrick! Uberjars. Cool. I'd actually heard of them. And thanks for the link to the example! I shall work through that today. I'm still learning sbt

Re: Can't seem to link external/twitter classes from my own app

2014-06-05 Thread Jeremy Lee
, 2014 at 5:46 PM, Nick Pentreath nick.pentre...@gmail.com wrote: Great - well we do hope we hear from you, since the user list is for interesting success stories and anecdotes, as well as blog posts etc too :) On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote

Twitter feed options?

2014-06-05 Thread Jeremy Lee
! -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Twitter feed options?

2014-06-05 Thread Jeremy Lee
Nope, sorry, nevermind! I looked at the source, and it was pretty obvious that it didn't implement that yet, so I've ripped the classes out and am mutating them into a new receivers right now... ... starting to get the hang of this. On Fri, Jun 6, 2014 at 1:07 PM, Jeremy Lee unorthodox.engine

Re: New user streaming question

2014-06-06 Thread Jeremy Lee
and the StreamingContext uses the network to read words, but as I said, nothing comes out. I tried changing the .print() to .saveAsTextFiles(), and I AM getting a file, but nothing is in it other than a _temporary subdir. I'm sure I'm confused here, but not sure where. Help? -- Jeremy Lee

Best practise for 'Streaming' dumps?

2014-06-06 Thread Jeremy Lee
persistent data for a streaming app? (Across restarts) And to clean up on termination? -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Best practise for 'Streaming' dumps?

2014-06-08 Thread Jeremy Lee
I read it more carefully, and window() might actually work for some other stuff like logs. (assuming I can have multiple windows with entirely different attributes on a single stream..) Thanks for that! On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Yes

Are scala.MatchError messages a problem?

2014-06-08 Thread Jeremy Lee
I shut down my first (working) cluster and brought up a fresh one... and It's been a bit of a horror and I need to sleep now. Should I be worried about these errors? Or did I just have the old log4j.config tuned so I didn't see them? I 14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Jeremy Lee
of learning maven, if it means I never have to use sbt again. Does it mean that? -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Enormous EC2 price jump makes r3.large patch more important

2014-06-17 Thread Jeremy Lee
on that issue. Let me know if I can help with testing and whatnot. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Enormous EC2 price jump makes r3.large patch more important

2014-06-17 Thread Jeremy Lee
a 1.0.1 release soon (this patch being one of the main reasons), but if you are itching for this sooner, you can just checkout the head of branch-1.0 and you will be able to use r3.XXX instances. - Patrick On Tue, Jun 17, 2014 at 4:17 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote

Re: Enormous EC2 price jump makes r3.large patch more important

2014-06-18 Thread Jeremy Lee
, 2014 at 9:29 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: I am about to spin up some new clusters, so I may give that a go... any special instructions for making them work? I assume I use the --spark-git-repo= option on the spark-ec2 command. Is it as easy as concatenating your

HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Hi All, Have anyone ran into the same problem? By looking at the source code in official release (rc11),this property settings is set to false by default, however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it to fill up the disk pretty fast since SparkContext deploys

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Forgot to mention that I am using spark-submit to submit jobs, and a verbose mode print out looks like this with the SparkPi examples.The .sparkStaging won't be deleted. My thoughts is that this should be part of the staging and should be cleaned up as well when sc gets terminated.

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-23 Thread Andrew Lee
I checked the source code, it looks like it was re-added back based on JIRA SPARK-1588, but I don't know if there's any test case associated with this? SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN. Sandy Ryza sa...@cloudera.com 2014-04-29 12:54:02 -0700

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote:

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com

RE: write event logs with YARN

2014-07-02 Thread Andrew Lee
Hi Christophe, Make sure you have 3 slashes in the hdfs scheme. e.g. hdfs:///server_name:9000/user/user_name/spark-events and in the spark-defaults.conf as well.spark.eventLog.dir=hdfs:///server_name:9000/user/user_name/spark-events Date: Thu, 19 Jun 2014 11:18:51 +0200 From:

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
Thanks! will take a look at this later today. HTH! On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Denny, just created https://issues.apache.org/jira/browse/SPARK-2356 On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote

RE: Spark logging strategy on YARN

2014-07-07 Thread Andrew Lee
Hi Kudryavtsev, Here's what I am doing as a common practice and reference, I don't want to say it is best practice since it requires a lot of customer experience and feedback, but from a development and operating stand point, it will be great to separate the YARN container logs with the Spark

spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-08 Thread Andrew Lee
Build: Spark 1.0.0 rc11 (git commit tag: 2f1dc868e5714882cf40d2633fb66772baf34789) Hi All, When I enabled the spark-defaults.conf for the eventLog, spark-shell broke while spark-submit works. I'm trying to create a separate directory per user to keep track with their own Spark job event

RE: SPARK_CLASSPATH Warning

2014-07-11 Thread Andrew Lee
As mentioned, deprecated in Spark 1.0+. Try to use the --driver-class-path: ./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar Don't use glob *, specify the JAR one by one with colon. Date: Wed, 9 Jul 2014 13:45:07 -0700 From: kat...@cs.pitt.edu Subject: SPARK_CLASSPATH Warning

RE: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-11 Thread Andrew Lee
Ok, I found it on JIRA SPARK-2390: https://issues.apache.org/jira/browse/SPARK-2390 So it looks like this is a known issue. From: alee...@hotmail.com To: user@spark.apache.org Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option? Date: Tue, 8 Jul 2014 15:17:00

SeattleSparkMeetup: Spark at eBay - Troubleshooting the everyday issues

2014-07-18 Thread Denny Lee
We're coming off a great Seattle Spark Meetup session with Evan Chan (@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra  (http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.  Now, we're proud to announce that our next session is Spark at eBay -

RE: Hive From Spark

2014-07-21 Thread Andrew Lee
Hi All, Currently, if you are running Spark HiveContext API with Hive 0.12, it won't work due to the following 2 libraries which are not consistent with Hive 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common practice, they should be consistent to work inter-operable).

RE: Hive From Spark

2014-07-22 Thread Andrew Lee
for Hive-on-Spark now. On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee alee...@hotmail.com wrote: Hive and Hadoop are using an older version of guava libraries (11.0.1) where Spark Hive is using guava 14.0.1+. The community isn't willing to downgrade to 11.0.1 which is the current version

akka 2.3.x?

2014-07-23 Thread Lee Mighdoll
-cassandra-connector rather than the hadoop back end? Cheers, Lee

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Andrew Lee
Hi Jianshi, Could you provide which HBase version you're using? By the way, a quick sanity check on whether the Workers can access HBase? Were you able to manually write one record to HBase with the serialize function? Hardcode and test it ? From: jianshi.hu...@gmail.com Date: Fri, 25 Jul 2014

Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to use the $USER env variable. For example, I'm running the command with user 'test'. In

RE: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
2014-07-28 12:40 GMT-07:00 Andrew Lee alee...@hotmail.com: Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to use the $USER env variable

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Andrew Lee
files explicitly to --jars option and it worked fine. The Caused by... messages were found in yarn logs actually, I think it might be useful if I can seem them from the console which runs spark-submit. Would that be possible? Jianshi On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee alee

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
/user/hive/warehouse) On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee lt; alee526@ gt; wrote: Hi All, It has been awhile, but what I did to make it work is to make sure the followings: 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2 2. Make sure you have

Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-14 Thread Denny Lee
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay - Troubleshooting the Everyday Issues, the slides have been now posted at:  http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf. Enjoy! Denny

Re: Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-15 Thread Denny Lee
Apologies but we had placed the settings for downloading the slides to Seattle Spark Meetup members only - but actually meant to share with everyone.  We have since fixed this and now you can download it.  HTH! On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote

Spark-job error on writing result into hadoop w/ switch_user=false

2014-08-20 Thread Jongyoul Lee
Hi, I've used hdfs 2.3.0-cdh5.0.1, mesos 0.19.1 and spark 1.0.2 that is re-compiled. For a security reason, we run hdfs and mesos as hdfs, that is an account name and not in a root group, and non-root user submit a spark job on mesos. With no-switch_user, simple job, which only read data from

LDA example?

2014-08-21 Thread Denny Lee
Quick question - is there a handy sample / example of how to use the LDA algorithm within Spark MLLib?   Thanks! Denny

RE: Hive From Spark

2014-08-25 Thread Andrew Lee
Lee alee...@hotmail.com wrote: Hopefully there could be some progress on SPARK-2420. It looks like shading may be the voted solution among downgrading. Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark 1.1.2? By the way, regarding bin/spark-sql? Is this more

Spark / Thrift / ODBC connectivity

2014-08-28 Thread Denny Lee
I’m currently using the Spark 1.1 branch and have been able to get the Thrift service up and running.  The quick questions were whether I should able to use the Thrift service to connect to SparkSQL generated tables and/or Hive tables?   As well, by any chance do we have any documents that

Re: SparkSQL HiveContext No Suitable Driver / Cannot Find Driver

2014-08-30 Thread Denny Lee
Oh, forgot to add the managed libraries and the Hive libraries within the CLASSPATH.  As soon as I did that, we’re good to go now. On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote: My issue is similar to the issue as noted  http://mail-archives.apache.org/mod_mbox

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread Denny Lee
Oh, you may be running into an issue with your MySQL setup actually, try running alter database metastore_db character set latin1 so that way Hive (and the Spark HiveContext) can execute properly against the metastore. On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com

Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-03 Thread Denny Lee
When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path $CLASSPATH It appears that the thrift server is starting off of localhost as opposed to hostname.  I have set the spark-env.sh to use the hostname, modified the

Re: Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-04 Thread Denny Lee
your-port This behavior is inherited from Hive since Spark SQL Thrift server is a variant of HiveServer2. ​ On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee denny.g@gmail.com wrote: When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread Denny Lee
Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed to RegisterAsTempTable to better reflect that. The Thrift server process runs under a different process meaning that it cannot see any of the

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
I’m not sure if I’m completely answering your question here but I’m currently working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 without any issues. On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote: I see the binary packages include hadoop 1,

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Denny Lee
registerTempTable you mentioned works on SqlContext instead of HiveContext. Thanks, Du On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote: Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
, but in Spark 1.1.0, there are separate packages for hadoop 2.3 and 2.4. That implies some difference in Spark according to hadoop version.   From:Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 9:35 AM To: user@spark.apache.org; Haopu Wang; d...@spark.apache.org

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
to read from HDFS, you’ll need to build Spark against the specific HDFS version in your environment.”   Did you try to read a hadoop 2.5.0 file using Spark 1.1 with hadoop 2.4?   Thanks!   From:Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 10:00 AM To: Patrick

Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee
When you re-ran sbt did you clear out the packages first and ensure that the datanucleus jars were generated within lib_managed? I remembered having to do that when I was working testing out different configs. On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 alexandria.shea...@gmail.com wrote:

Re: Spark SQL Thrift JDBC server deployment for production

2014-09-11 Thread Denny Lee
Could you provide some context about running this in yarn-cluster mode? The Thrift server that's included within Spark 1.1 is based on Hive 0.12. Hive has been able to work against YARN since Hive 0.10. So when you start the thrift server, provided you copied the hive-site.xml over to the Spark

Re: SchemaRDD and RegisterAsTable

2014-09-17 Thread Denny Lee
The registered table is stored within the spark context itself.  To have the table available for the thrift server to get access to, you can save the sc table into the Hive context so that way the Thrift server process can see the table.  If you are using derby as your metastore, then the

Re: What is a pre built package of Apache Spark

2014-09-24 Thread Denny Lee
This seems similar to a related Windows issue concerning python where pyspark could't find the python because the PYTHONSTARTUP environment wasn't set - by any chance could this be related? On Wed, Sep 24, 2014 at 7:51 PM, christy 760948...@qq.com wrote: Hi I have installed standalone on

Re: Spark Hive max key length is 767 bytes

2014-09-25 Thread Denny Lee
by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Should I use HIVE 0.12.0 instead of HIVE 0.13.1? Regards Arthur On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g

Re: add external jars to spark-shell

2014-10-20 Thread Denny Lee
–jar (ADD_JARS) is a special class loading for Spark while –driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and appended to classpath settings that is used to start the JVM running the driver You can reference https://www.concur.com/blog/en-us/connect-tableau-to-sparksql

Re: winutils

2014-10-29 Thread Denny Lee
QQ - did you download the Spark 1.1 binaries that included the Hadoop one? Does this happen if you're using the Spark 1.1 binaries that do not include the Hadoop jars? On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub ronalday...@live.com wrote: Apparently Spark does require Hadoop even if you do not

Re: Spark + Tableau

2014-10-30 Thread Denny Lee
When you are starting the thrift server service - are you connecting to it locally or is this on a remote server when you use beeline and/or Tableau? On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic blood9ra...@gmail.com wrote: I use beta driver SQL ODBC from Databricks. -- View this message

Spark Streaming not working in YARN mode

2014-11-19 Thread kam lee
I created a simple Spark Streaming program - it received numbers and computed averages and sent the results to Kafka. It worked perfectly in local mode as well as standalone master/slave mode across a two-node cluster. It did not work however in yarn-client or yarn-cluster mode. The job was

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Denny Lee
extraction job against multiple data sources via Hadoop streaming. Another good call out but utilizing Scala within Spark is that most of the Spark code is written in Scala. On Sat, Nov 22, 2014 at 08:12 Denny Lee denny.g@gmail.com wrote: There are various scenarios where traditional Hadoop

Re: Spark SQL Programming Guide - registerTempTable Error

2014-11-23 Thread Denny Lee
By any chance are you using Spark 1.0.2? registerTempTable was introduced from Spark 1.1+ while for Spark 1.0.2, it would be registerAsTable. On Sun Nov 23 2014 at 10:59:48 AM riginos samarasrigi...@gmail.com wrote: Hi guys , Im trying to do the Spark SQL Programming Guide but after the:

Re: Spark SQL Programming Guide - registerTempTable Error

2014-11-23 Thread Denny Lee
It sort of depends on your environment. If you are running on your local environment, I would just download the latest Spark 1.1 binaries and you'll be good to go. If its a production environment, it sort of depends on how you are setup (e.g. AWS, Cloudera, etc.) On Sun Nov 23 2014 at 11:27:49

Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava

2014-11-25 Thread Denny Lee
To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com wrote: I traced the code and used

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps. If I was running this on standalone cluster mode the query finished in 55s but on YARN, the query was still running 30min later. Would the hard coded sleeps potentially be in play here? On Fri, Dec 5, 2014 at 11:23 Sandy

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
, and --num-executors arguments? When running against a standalone cluster, by default Spark will make use of all the cluster resources, but when running against YARN, Spark defaults to a couple tiny executors. -Sandy On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote: My

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
Okay, my bad for not testing out the documented arguments - once i use the correct ones, the query shrinks completes in ~55s (I can probably make it faster). Thanks for the help, eh?! On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote: Sorry for the delay in my response

Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated in YARN via the various configurations. For example, if I spin up my cluster with 4GB with a different number of executors as noted below 4GB executor-memory x 10 executors = 46GB

Re: Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
* executorMemory. When you set executor memory, the yarn resource request is executorMemory + yarnOverhead. - Arun On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote: This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated

Re: Spark on YARN memory utilization

2014-12-09 Thread Denny Lee
Thanks Sandy! On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote: Another thing to be aware of is that YARN will round up containers to the nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults to 1024. -Sandy On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee

Re: Spark-SQL JDBC driver

2014-12-11 Thread Denny Lee
Yes, that is correct. A quick reference on this is the post https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1 with the pertinent section being: It is important to note that when you create Spark tables (for

Re: Spark SQL Roadmap?

2014-12-13 Thread Denny Lee
Hi Xiaoyong, SparkSQL has already been released and has been part of the Spark code-base since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark SQL Programming Guide http://spark.apache.org/docs/1.1.0/sql-programming-guide.html) and we're currently voting on Spark 1.2. Hive

Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
I have a large of files within HDFS that I would like to do a group by statement ala val table = sc.textFile(hdfs://) val tabs = table.map(_.split(\t)) I'm trying to do something similar to tabs.map(c = (c._(167), c._(110), c._(200)) where I create a new RDD that only has but that isn't

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
looks like the way to go given the context. What's not working? Kr, Gerard On Dec 14, 2014 5:17 PM, Denny Lee denny.g@gmail.com wrote: I have a large of files within HDFS that I would like to do a group by statement ala val table = sc.textFile(hdfs://) val tabs = table.map(_.split

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
Yes - that works great! Sorry for implying I couldn't. Was just more flummoxed that I couldn't make the Scala call work on its own. Will continue to debug ;-) On Sun, Dec 14, 2014 at 11:39 Michael Armbrust mich...@databricks.com wrote: BTW, I cannot use SparkSQL / case right now because my table

  1   2   3   >