|
| To | lee |
| Cc | user@spark.apache.org |
| Subject | Re: What is the range of the PageRank value of graphx |
From the docs:
* Note that this is not the "normalized" PageRank and as a consequence pages
that have no
* inlinks will have a PageRank of alpha. In particular, the pageranks may
When I calculate pagerank using HugeGraph, each pagerank value is less than 1,
and the total of pageranks is 1. However, the PageRank value of graphx is often
greater than 1, so what is the range of the PageRank value of graphx?
||
李杰
|
|
leedd1...@163.com
|
Unsubscribe
| |
李杰
|
|
leedd1...@163.com
|
Hi All,
I'm getting the following error when I execute start-master.sh which also
invokes spark-class at the end.
Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
You need to build Spark with 'sbt/sbt assembly' before running this program.
After digging into the
to the jar it self so need for random class paths.
On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee alee...@hotmail.com wrote:
Hi All,
I'm getting the following error when I execute start-master.sh which also
invokes spark-class at the end.
Failed to find Spark assembly in /root/spark/assembly
If you have any questions on helping to get a Spark Meetup off the ground,
please do not hesitate to ping me (denny.g@gmail.com). I helped jump start
the one here in Seattle (and tangentially have been helping the Vancouver and
Denver ones as well). HTH!
On March 31, 2014 at 12:35:38
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera
Manager, Spark is running healthy.
But when I try to run spark-shell, I eventually get the error:
14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master
spark://ip-172-xxx-xxx-xxx:7077...
14/04/02
You may also want to check out Paco Nathan's Introduction to Spark courses:
http://liber118.com/pxn/
On May 1, 2014, at 8:20 AM, Mayur Rustagi mayur.rust...@gmail.com wrote:
Hi Nicholas,
We provide training on spark, hands-on also associated ecosystem.
We gave it recently at a
Hi All,
I encountered this problem when the firewall is enabled between the spark-shell
and the Workers.
When I launch spark-shell in yarn-client mode, I notice that Workers on the
YARN containers are trying to talk to the driver (spark-shell), however, the
firewall is not opened and caused
-0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org
I think what you want to do is set spark.driver.port to a fixed port.
On Fri, May 2, 2014 at 1:52 PM, Andrew Lee alee...@hotmail.com
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here
are the links to the various slides:
Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei
Zaharia and Pat McDonough
Learnings from Running Spark at Twitter sessions
Ben Hindman’s Mesos
.nabble.com/Securing-Spark-s-Network-tp4832p4984.html
[2] http://en.wikipedia.org/wiki/Ephemeral_port
[3]
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html
Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075
Andrew Lee ---05/02/2014
pairs
//set parallelism to 1 to keep the file from being partitioned
sc.makeRDD(kv,1)
.saveAsSequenceFile(path)
Does anyone have any pointers on how to get past this?
Thanks,
--
*Allen Lee*
Software Engineer
MediaCrossing Inc.
- (512) 286-6075
Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into
account, I'm actually thinking about using a separate subnet to
From: Andrew Lee alee...@hotmail.com
To: user@spark.apache.org user@spark.apache.org
Date: 05/04/2014 09:57 PM
Subject
Please check JAVA_HOME. Usually it should point to /usr/java/default on
CentOS/Linux.
or FYI: http://stackoverflow.com/questions/1117398/java-home-directory
Date: Tue, 6 May 2014 00:23:02 -0700
From: sln-1...@163.com
To: u...@spark.incubator.apache.org
Subject: run spark0.9.1 on yarn with
Does anyone know if:
./bin/spark-shell --master yarn
is running yarn-cluster or yarn-client by default?
Base on source code:
./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
if (args.deployMode == cluster args.master.startsWith(yarn)) {
args.master = yarn-cluster
:
if (args.deployMode != cluster args.master.startsWith(yarn)) {
args.master = yarn-client}
2014-05-21 10:57 GMT-07:00 Andrew Lee alee...@hotmail.com:
Does anyone know if:
./bin/spark-shell --master yarn
is running yarn-cluster or yarn-client by default?
Base on source code:
./core/src
For those whom were not able to attend the last Seattle Spark Meetup, we had a
great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and
Mesos - you can find the slides at:
http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014.
As well, check out the next
12.04 AMI... that
might be a good place to start. But if there is a straightforward way to
make them compatible with 2.6 we should do that.
For r3.large, we can add that to the script. It's a newer type. Any
interest in contributing this?
- Patrick
On May 30, 2014 5:08 AM, Jeremy Lee
to bite
the bullet and start building my own AMI's from scratch... if anyone can
save me from that, I'd be most grateful.
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
Lee BCompSci(Hons)
The Unorthodox Engineers
/10.100.75.70:38485
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
-a that allows you to give a specific
AMI. This flag is just an internal tool that we use for testing when
we spin new AMI's. Users can't set that to an arbitrary AMI because we
tightly control things like the Java and OS versions, libraries, etc.
On Sun, Jun 1, 2014 at 12:51 AM, Jeremy Lee
/SPARK-1990 to track
this.
Matei
On Jun 1, 2014, at 6:14 PM, Jeremy Lee unorthodox.engine...@gmail.com
wrote:
Sort of.. there were two separate issues, but both related to AWS..
I've sorted the confusion about the Master/Worker AMI ... use the version
chosen by the scripts. (and use
, I'm sure I'll get there. But I do understand the
implications of a mixed functional-imperative language with closures and
lambdas. That is serious voodoo.
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
The name is spark-streaming-twitter_2.10
On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
unorthodox.engine...@gmail.com wrote:
Man, this has been hard going. Six days, and I finally got a Hello
World
App
http://nabble.com/.
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
if creating Uberjars takes this
long every... single... time...
On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee unorthodox.engine...@gmail.com
wrote:
Thanks Patrick!
Uberjars. Cool. I'd actually heard of them. And thanks for the link to the
example! I shall work through that today.
I'm still learning sbt
, 2014 at 5:46 PM, Nick Pentreath nick.pentre...@gmail.com
wrote:
Great - well we do hope we hear from you, since the user list is for
interesting success stories and anecdotes, as well as blog posts etc too :)
On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee unorthodox.engine...@gmail.com
wrote
!
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
Nope, sorry, nevermind!
I looked at the source, and it was pretty obvious that it didn't implement
that yet, so I've ripped the classes out and am mutating them into a new
receivers right now...
... starting to get the hang of this.
On Fri, Jun 6, 2014 at 1:07 PM, Jeremy Lee unorthodox.engine
and the StreamingContext uses the network
to read words, but as I said, nothing comes out.
I tried changing the .print() to .saveAsTextFiles(), and I AM getting a
file, but nothing is in it other than a _temporary subdir.
I'm sure I'm confused here, but not sure where. Help?
--
Jeremy Lee
persistent data for a streaming app?
(Across restarts) And to clean up on termination?
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
I read it more carefully, and window() might actually work for some other
stuff like logs. (assuming I can have multiple windows with entirely
different attributes on a single stream..)
Thanks for that!
On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee unorthodox.engine...@gmail.com
wrote:
Yes
I shut down my first (working) cluster and brought up a fresh one... and
It's been a bit of a horror and I need to sleep now. Should I be worried
about these errors? Or did I just have the old log4j.config tuned so I
didn't see them?
I
14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error
of learning maven, if it means I never have to use sbt
again. Does it mean that?
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
on that issue. Let me know if I can help with testing and whatnot.
--
Jeremy Lee BCompSci(Hons)
The Unorthodox Engineers
a 1.0.1 release soon (this patch being one of the main reasons),
but if you are itching for this sooner, you can just checkout the head
of branch-1.0 and you will be able to use r3.XXX instances.
- Patrick
On Tue, Jun 17, 2014 at 4:17 PM, Jeremy Lee
unorthodox.engine...@gmail.com wrote
, 2014 at 9:29 PM, Jeremy Lee
unorthodox.engine...@gmail.com wrote:
I am about to spin up some new clusters, so I may give that a go... any
special instructions for making them work? I assume I use the
--spark-git-repo= option on the spark-ec2 command. Is it as easy as
concatenating your
Hi All,
Have anyone ran into the same problem? By looking at the source code in
official release (rc11),this property settings is set to false by default,
however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it
to fill up the disk pretty fast since SparkContext deploys
Forgot to mention that I am using spark-submit to submit jobs, and a verbose
mode print out looks like this with the SparkPi examples.The .sparkStaging
won't be deleted. My thoughts is that this should be part of the staging and
should be cleaned up as well when sc gets terminated.
I checked the source code, it looks like it was re-added back based on JIRA
SPARK-1588, but I don't know if there's any test case associated with this?
SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
Sandy Ryza sa...@cloudera.com
2014-04-29 12:54:02 -0700
By any chance do you have HDP 2.1 installed? you may need to install the utils
and update the env variables per
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com wrote:
issue.
On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev
kudryavtsev.konstan...@gmail.com wrote:
No, I don’t
why do I need to have HDP installed? I don’t use Hadoop at all and I’d
like to read data from local filesystem
On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com
Hi Christophe,
Make sure you have 3 slashes in the hdfs scheme.
e.g.
hdfs:///server_name:9000/user/user_name/spark-events
and in the spark-defaults.conf as
well.spark.eventLog.dir=hdfs:///server_name:9000/user/user_name/spark-events
Date: Thu, 19 Jun 2014 11:18:51 +0200
From:
=hdinsight
2) put this file into d:\winutil\bin
3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)
after that test runs
Thank you,
Konstantin Kudryavtsev
On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:
You don't actually need it per se - its just that some
Thanks! will take a look at this later today. HTH!
On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev
kudryavtsev.konstan...@gmail.com wrote:
Hi Denny,
just created https://issues.apache.org/jira/browse/SPARK-2356
On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote
Hi Kudryavtsev,
Here's what I am doing as a common practice and reference, I don't want to say
it is best practice since it requires a lot of customer experience and
feedback, but from a development and operating stand point, it will be great to
separate the YARN container logs with the Spark
Build: Spark 1.0.0 rc11 (git commit tag:
2f1dc868e5714882cf40d2633fb66772baf34789)
Hi All,
When I enabled the spark-defaults.conf for the eventLog, spark-shell broke
while spark-submit works.
I'm trying to create a separate directory per user to keep track with their own
Spark job event
As mentioned, deprecated in Spark 1.0+.
Try to use the --driver-class-path:
./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar
Don't use glob *, specify the JAR one by one with colon.
Date: Wed, 9 Jul 2014 13:45:07 -0700
From: kat...@cs.pitt.edu
Subject: SPARK_CLASSPATH Warning
Ok, I found it on JIRA SPARK-2390:
https://issues.apache.org/jira/browse/SPARK-2390
So it looks like this is a known issue.
From: alee...@hotmail.com
To: user@spark.apache.org
Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file
option?
Date: Tue, 8 Jul 2014 15:17:00
We're coming off a great Seattle Spark Meetup session with Evan Chan
(@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra
(http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.
Now, we're proud to announce that our next session is Spark at eBay -
Hi All,
Currently, if you are running Spark HiveContext API with Hive 0.12, it won't
work due to the following 2 libraries which are not consistent with Hive 0.12
and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common
practice, they should be consistent to work inter-operable).
for Hive-on-Spark now.
On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee alee...@hotmail.com wrote:
Hive and Hadoop are using an older version of guava libraries (11.0.1) where
Spark Hive is using guava 14.0.1+.
The community isn't willing to downgrade to 11.0.1 which is the current
version
-cassandra-connector rather than the hadoop back end?
Cheers,
Lee
Hi Jianshi,
Could you provide which HBase version you're using?
By the way, a quick sanity check on whether the Workers can access HBase?
Were you able to manually write one record to HBase with the serialize
function? Hardcode and test it ?
From: jianshi.hu...@gmail.com
Date: Fri, 25 Jul 2014
Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0
when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable.
For example, I'm running the command with user 'test'.
In
2014-07-28 12:40 GMT-07:00 Andrew Lee alee...@hotmail.com:
Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0
when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable
files explicitly to --jars option and it worked fine.
The Caused by... messages were found in yarn logs actually, I think it might
be useful if I can seem them from the console which runs spark-submit. Would
that be possible?
Jianshi
On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee alee
/user/hive/warehouse)
On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee lt;
alee526@
gt; wrote:
Hi All,
It has been awhile, but what I did to make it work is to make sure the
followings:
1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
2. Make sure you have
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay
- Troubleshooting the Everyday Issues, the slides have been now posted at:
http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf.
Enjoy!
Denny
Apologies but we had placed the settings for downloading the slides to Seattle
Spark Meetup members only - but actually meant to share with everyone. We have
since fixed this and now you can download it. HTH!
On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote
Hi,
I've used hdfs 2.3.0-cdh5.0.1, mesos 0.19.1 and spark 1.0.2 that is
re-compiled.
For a security reason, we run hdfs and mesos as hdfs, that is an account
name and not in a root group, and non-root user submit a spark job on
mesos. With no-switch_user, simple job, which only read data from
Quick question - is there a handy sample / example of how to use the LDA
algorithm within Spark MLLib?
Thanks!
Denny
Lee alee...@hotmail.com wrote:
Hopefully there could be some progress on SPARK-2420. It looks like
shading
may be the voted solution among downgrading.
Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark
1.1.2?
By the way, regarding bin/spark-sql? Is this more
I’m currently using the Spark 1.1 branch and have been able to get the Thrift
service up and running. The quick questions were whether I should able to use
the Thrift service to connect to SparkSQL generated tables and/or Hive tables?
As well, by any chance do we have any documents that
Oh, forgot to add the managed libraries and the Hive libraries within the
CLASSPATH. As soon as I did that, we’re good to go now.
On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote:
My issue is similar to the issue as noted
http://mail-archives.apache.org/mod_mbox
Oh, you may be running into an issue with your MySQL setup actually, try running
alter database metastore_db character set latin1
so that way Hive (and the Spark HiveContext) can execute properly against the
metastore.
On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com
When I start the thrift server (on Spark 1.1 RC4) via:
./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path
$CLASSPATH
It appears that the thrift server is starting off of localhost as opposed to
hostname. I have set the spark-env.sh to use the hostname, modified the
your-port
This behavior is inherited from Hive since Spark SQL Thrift server is a variant
of HiveServer2.
On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee denny.g@gmail.com wrote:
When I start the thrift server (on Spark 1.1 RC4) via:
./sbin/start-thriftserver.sh --master spark://hostname:7077
Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed to
RegisterAsTempTable to better reflect that.
The Thrift server process runs under a different process meaning that it cannot
see any of the
I’m not sure if I’m completely answering your question here but I’m currently
working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4
without any issues.
On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote:
I see the binary packages include hadoop 1,
registerTempTable you mentioned works on SqlContext instead of HiveContext.
Thanks,
Du
On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote:
Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed
, but in Spark 1.1.0, there are separate packages for hadoop 2.3 and 2.4.
That implies some difference in Spark according to hadoop version.
From:Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 9:35 AM
To: user@spark.apache.org; Haopu Wang; d...@spark.apache.org
to read from HDFS, you’ll need to build Spark against the
specific HDFS version in your environment.”
Did you try to read a hadoop 2.5.0 file using Spark 1.1 with hadoop 2.4?
Thanks!
From:Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 10:00 AM
To: Patrick
When you re-ran sbt did you clear out the packages first and ensure that
the datanucleus jars were generated within lib_managed? I remembered
having to do that when I was working testing out different configs.
On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101
alexandria.shea...@gmail.com wrote:
Could you provide some context about running this in yarn-cluster mode?
The Thrift server that's included within Spark 1.1 is based on Hive 0.12.
Hive has been able to work against YARN since Hive 0.10. So when you start
the thrift server, provided you copied the hive-site.xml over to the Spark
The registered table is stored within the spark context itself. To have the
table available for the thrift server to get access to, you can save the sc
table into the Hive context so that way the Thrift server process can see the
table. If you are using derby as your metastore, then the
This seems similar to a related Windows issue concerning python where
pyspark could't find the python because the PYTHONSTARTUP environment
wasn't set - by any chance could this be related?
On Wed, Sep 24, 2014 at 7:51 PM, christy 760948...@qq.com wrote:
Hi I have installed standalone on
by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
Specified key was too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Should I use HIVE 0.12.0 instead of HIVE 0.13.1?
Regards
Arthur
On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g
–jar (ADD_JARS) is a special class loading for Spark while
–driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and
appended to classpath settings that is used to start the JVM running the
driver
You can reference
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
QQ - did you download the Spark 1.1 binaries that included the Hadoop one?
Does this happen if you're using the Spark 1.1 binaries that do not include
the Hadoop jars?
On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub ronalday...@live.com wrote:
Apparently Spark does require Hadoop even if you do not
When you are starting the thrift server service - are you connecting to it
locally or is this on a remote server when you use beeline and/or Tableau?
On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic blood9ra...@gmail.com wrote:
I use beta driver SQL ODBC from Databricks.
--
View this message
I created a simple Spark Streaming program - it received numbers and
computed averages and sent the results to Kafka.
It worked perfectly in local mode as well as standalone master/slave mode
across a two-node cluster.
It did not work however in yarn-client or yarn-cluster mode.
The job was
extraction job against multiple data sources via Hadoop streaming.
Another good call out but utilizing Scala within Spark is that most of the
Spark code is written in Scala.
On Sat, Nov 22, 2014 at 08:12 Denny Lee denny.g@gmail.com wrote:
There are various scenarios where traditional Hadoop
By any chance are you using Spark 1.0.2? registerTempTable was introduced
from Spark 1.1+ while for Spark 1.0.2, it would be registerAsTable.
On Sun Nov 23 2014 at 10:59:48 AM riginos samarasrigi...@gmail.com wrote:
Hi guys ,
Im trying to do the Spark SQL Programming Guide but after the:
It sort of depends on your environment. If you are running on your local
environment, I would just download the latest Spark 1.1 binaries and you'll
be good to go. If its a production environment, it sort of depends on how
you are setup (e.g. AWS, Cloudera, etc.)
On Sun Nov 23 2014 at 11:27:49
To determine if this is a Windows vs. other configuration, can you just try
to call the Spark-class.cmd SparkSubmit without actually referencing the
Hadoop or Thrift server classes?
On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com
wrote:
I traced the code and used
My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps.
If I was running this on standalone cluster mode the query finished in 55s
but on YARN, the query was still running 30min later. Would the hard coded
sleeps potentially be in play here?
On Fri, Dec 5, 2014 at 11:23 Sandy
, and --num-executors
arguments? When running against a standalone cluster, by default Spark
will make use of all the cluster resources, but when running against YARN,
Spark defaults to a couple tiny executors.
-Sandy
On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com
wrote:
My
Okay, my bad for not testing out the documented arguments - once i use the
correct ones, the query shrinks completes in ~55s (I can probably make it
faster). Thanks for the help, eh?!
On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote:
Sorry for the delay in my response
This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated in YARN via the various
configurations. For example, if I spin up my cluster with 4GB with a
different number of executors as noted below
4GB executor-memory x 10 executors = 46GB
* executorMemory.
When you set executor memory, the yarn resource request is executorMemory
+ yarnOverhead.
- Arun
On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote:
This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated
Thanks Sandy!
On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote:
Another thing to be aware of is that YARN will round up containers to the
nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
to 1024.
-Sandy
On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee
Yes, that is correct. A quick reference on this is the post
https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
with the pertinent section being:
It is important to note that when you create Spark tables (for
Hi Xiaoyong,
SparkSQL has already been released and has been part of the Spark code-base
since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark
SQL Programming Guide
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html) and we're
currently voting on Spark 1.2.
Hive
I have a large of files within HDFS that I would like to do a group by
statement ala
val table = sc.textFile(hdfs://)
val tabs = table.map(_.split(\t))
I'm trying to do something similar to
tabs.map(c = (c._(167), c._(110), c._(200))
where I create a new RDD that only has
but that isn't
looks like
the way to go given the context. What's not working?
Kr, Gerard
On Dec 14, 2014 5:17 PM, Denny Lee denny.g@gmail.com wrote:
I have a large of files within HDFS that I would like to do a group by
statement ala
val table = sc.textFile(hdfs://)
val tabs = table.map(_.split
Yes - that works great! Sorry for implying I couldn't. Was just more
flummoxed that I couldn't make the Scala call work on its own. Will
continue to debug ;-)
On Sun, Dec 14, 2014 at 11:39 Michael Armbrust mich...@databricks.com
wrote:
BTW, I cannot use SparkSQL / case right now because my table
1 - 100 of 299 matches
Mail list logo