Hi Tobias,
It seems that repartition can create more executors for the stages
following data receiving. However, the number of executors is still far
less than what I require (I specify one core for each executor). Based on
the index of the executors in the stage, I find many numbers are missing
Yup...the scala version 2.11.0 caused it...with 2.10.4, I could compile
1.0.1 and HEAD both for 2.3.0cdh5.0.2
On Sat, Jul 19, 2014 at 8:14 PM, Debasish Das debasish.da...@gmail.com
wrote:
I compiled spark 1.0.1 with 2.3.0cdh5.0.2 today...
No issues with mvn compilation but my sbt build
Using the spark-ec2 script with m3.2xlarge instances seems to not have /mnt
and /mnt2 pointing to the 80gb SSDs that come with that instance. Does
anybody know whether extra steps are required when using this instance type?
Thanks,
Chris
Hi i created a demo input.
https://gist.github.com/b0c1/e3721af839feec433b56#file-gistfile1-txt-L10
As you see in line 10 the json received (json/string nevermind)
After that everything is ok, except the processing not started...
Any idea?
Please help guys... I doesn't have any idea what I
Hi, Michael
I only modified the default hadoop version to 0.20.2-cdh3u5, and
DEFAULT_HIVE=true in SparkBuild.scala.
Then sbt/sbt assembly.
I just run in the local standalone mode by using sbin/start-all.sh.
Hadoop version is 0.20.2-cdh3u5.
Then use spark-shell to execute the spark
like this:
val sc = new SparkContext(new SparkConf().setAppName(SLA Filter))
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val suffix = args(0)
sqlContext.parquetFile(/user/hive/warehouse/xxx_parquet.db/xx001_
+
Hi All,
In a JAVA based scenario where we have a large Oracle DB and want to use
spark to do some distributed analysis being done on the data -- in such
case how exactly we go about defining a JDBC connection and querying the
data
thanks,
--
Ahmed Osama Ibrahim
ITSC International
According to the api docs for the pipe operator,
def pipe(command: String): RDD
http://spark.apache.org/docs/1.0.0/api/scala/org/apache/spark/rdd/RDD.html
[String]: Return an RDD created by piping elements to a forked external
process.
However, its not clear to me:
Will the outputted RDD capture
Nevermind :) I found my answer in the docs for the PipedRDD
/**
* An RDD that pipes the contents of each parent partition through an
external command
* (printing them one per line) and returns the output as a collection of
strings.
*/
private[spark] class PipedRDD[T: ClassTag](
So, this is
Is this with the 1.0.0 scripts? I believe it's fixed in 1.0.1.
Matei
On Jul 20, 2014, at 1:22 AM, Chris DuBois chris.dub...@gmail.com wrote:
Using the spark-ec2 script with m3.2xlarge instances seems to not have /mnt
and /mnt2 pointing to the 80gb SSDs that come with that instance. Does
I pulled the latest last night. I'm on commit 4da01e3.
On Sun, Jul 20, 2014 at 2:08 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Is this with the 1.0.0 scripts? I believe it's fixed in 1.0.1.
Matei
On Jul 20, 2014, at 1:22 AM, Chris DuBois chris.dub...@gmail.com wrote:
Using the
Hi, Victor
I got the same issue and I posted it.
In my case, it only happens when I query some spark-sql on spark 1.0.1 but
for spark 1.0.0, it works properly.
Have you run the same job on spark 1.0.0 ?
Sincerely,
Kevin
--
View this message in context:
JiaJia, I've checkout the latest 1.0 branch, and then do the following steps:
SPAKR_HIVE=true sbt/sbt clean assembly
cd examples
../bin/run-example sql.hive.HiveFromSpark
It works well in my local
From your log output, it shows Invalid method name: 'get_table', seems an
incompatible jar version
It was because of the latest change to task serialization:
https://github.com/apache/spark/commit/1efb3698b6cf39a80683b37124d2736ebf3c9d9a
The task size is no longer limited by akka.frameSize but we show
warning messages if the task size is above 100KB. Please check the
objects referenced in the
Hi Guys,
Any simplistic example for JDBCRDD for a newbie?
--
Ahmed Osama Ibrahim
ITSC International Technology Services Corporation
www.itscorpmd.com
Tel: +1 240 685 1444
Fax: +1 240 668 9841
hello, what does @developerApi?
I saw it appear many times in spark source code
Thx
The javaDoc seems reasonably helpful:
/**
* A lower-level, unstable API intended for developers.
*
* Developer API's might change or be removed in minor versions of Spark.
*
*/
These would be contrasted with non-Developer (more or less
production?) API's that are deemed to be stable within a
On Fri, Jul 18, 2014 at 9:07 PM, ShreyanshB shreyanshpbh...@gmail.com
wrote:
Does the suggested version with in-memory shuffle affects performance too
much?
We've observed a 2-3x speedup from it, at least on larger graphs like
twitter-2010 http://law.di.unimi.it/webdata/twitter-2010/ and
When spark is 0.7.3, I use SparkEnv.get.blockManager.getLocal(model) and
SparkEnv.get.blockManager.put(model, buf, StorageLevel.MEMORY_ONLY, false) to
cached model object
When I porting to spark 1.0.1, I found SparkEnv.get.blockManager.getLocal
SparkEnv.get.blockManager.put's APIs changed
Hm, this is not a public API, but you should theoretically be able to use
TestBlockId if you like. Internally, we just use the BlockId's natural
hashing and equality to do lookups and puts, so it should work fine.
However, since it is in no way public API, it may change even in
maintenance
Thanks for your help; problem resolved.
As pointed out by Andrew and Meethu, I needed to use
spark://vmsparkwin1:7077 rather than the equivalent spark://10.1.3.7:7077 in
the spark-submit command.
It appears that the argument in the --master option for the spark-submit
must match exactly (not
thank you Stephen
-- 原始邮件 --
发件人: Stephen Boesch;java...@gmail.com;
发送时间: 2014年7月21日(星期一) 中午11:55
收件人: useruser@spark.apache.org;
主题: Re: What does @developerApi means?
The javaDoc seems reasonably helpful:
/**
* A lower-level, unstable API intended for
thank you Aaron
-- 原始邮件 --
发件人: Aaron Davidson;ilike...@gmail.com;
发送时间: 2014年7月21日(星期一) 中午1:40
收件人: useruser@spark.apache.org;
主题: Re: which kind of BlockId should I use?
Hm, this is not a public API, but you should theoretically be able to use
23 matches
Mail list logo