Hi all
I was finally able to get this working by setting
the SPARK_EXECUTOR_INSTANCES to a high number. However, I am wondering if
this is a bug because the app gets submitted but ceases to run because it
can't run desired number of workers. Shouldn't the app be rejected if it
cant be run on the
I guess because this example is stateless, so it outputs counts only for
given RDD. Take a look at stateful word counter
StatefulNetworkWordCount.scala
On Wed, Sep 24, 2014 at 4:29 AM, SK skrishna...@gmail.com wrote:
I execute it as follows:
$SPARK_HOME/bin/spark-submit --master master url
Hey Dan,
Thanks for your reply. I have a couple of questions.
1) Were you able to verify that this is because of GC? If yes, then could
you let me know how.
2) If this is GC, then do you know of any tuning I can do to reduce this GC
pause?
Regards,
Samay
On Tue, Sep 23, 2014 at 11:15 PM, Dan
Hi all
Reading through Spark streaming's custom receiver documentation, it is
recommended that onStart and onStop methods should not block indefinitely.
However, looking at the source code of KinesisReceiver, the onStart method
calls worker.run that blocks until worker is shutdown (via a call to
If you look at the code for HdfsWordCount, you see it calls print(),
which defaults to print 10 elements from each RDD. If you are just
talking about the console output, then it is not expected to print all
words to begin with.
On Wed, Sep 24, 2014 at 2:29 AM, SK skrishna...@gmail.com wrote:
I
top returns the specified number of largest elements in your RDD.
They are returned to the driver as an Array. If you want to make an
RDD out of them again, call SparkContext.parallelize(...). Make sure
this is what you mean though.
On Wed, Sep 24, 2014 at 5:33 AM, Deep Pradhan
Hi,
Does anybody know how to use sortbykey in scala on a RDD like :
val rddToSave = file.map(l = l.split(\\|)).map(r = (r(34)+-+r(3),
r(4), r(10), r(12)))
besauce, i received ann error sortByKey is not a member of
ord.apache.spark.rdd.RDD[(String,String,String,String)].
What i try do
Hi,
SortByKey is only for RDD[(K, V)], each tuple can only has two members, Spark
will sort with first member, if you want to use sortByKey, you have to change
your RDD[(String, String, String, String)] into RDD[(String, (String, String,
String))].
Thanks
Jerry
-Original Message-
Hi David,
Can you try val rddToSave = file.map(l = l.split(\\|)).map(r =
(r(34)+-+r(3), (r(4), r(10), r(12 ?
That should work.
Liquan
On Wed, Sep 24, 2014 at 1:29 AM, david david...@free.fr wrote:
Hi,
Does anybody know how to use sortbykey in scala on a RDD like :
val
Hi,
I have a setup (in mind) where data is written to Kafka and this data is
persisted in HDFS (e.g., using camus) so that I have an all-time archive of
all stream data ever received. Now I want to process that all-time archive
and when I am done with that, continue with the live stream, using
Given this program.. I have the following queries..
val sparkConf = new SparkConf().setAppName(NetworkWordCount)
sparkConf.set(spark.master, local[2])
val ssc = new StreamingContext(sparkConf, Seconds(10))
val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
Hi, I want to extract a subgraph using two clauses such as (?person worksAt
MIT MIT locatesIn USA), I have this query. My data is constructed as
triplets so I assume Graphx will suit it best. But I found out that subgraph
method only supports a single edge checking and it seems to me that I
Hello,
if our Hadoop cluster is configured with HA and fs.defaultFS points to a
namespace instead of a namenode hostname - hdfs://namespace_name/ - then
our Spark job fails with exception. Is there anything to configure or it is
not implemented?
Exception in thread main
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal
thank's
i've already try this solution but it does not compile (in Eclipse)
I'm surprise to see that in Spark-shell, sortByKey works fine on 2
solutions :
(String,String,String,String)
(String,(String,String,String))
--
View this message in context:
Hi,
On Wed, Sep 24, 2014 at 7:23 PM, Dibyendu Bhattacharya
dibyendu.bhattach...@gmail.com wrote:
So you have a single Kafka topic which has very high retention period (
that decides the storage capacity of a given Kafka topic) and you want to
process all historical data first using Camus and
Hi ,
I have the following rdd :
val conf = new SparkConf()
.setAppName( Testing Sorting )
val sc = new SparkContext(conf)
val L = List(
(new Student(XiaoTao, 80, 29), I'm Xiaotao),
(new Student(CCC, 100, 24), I'm CCC),
(new Student(Jack, 90, 25), I'm Jack),
Hey all
I tried spark connector with Cassandra and I ran into a problem that I was
blocked on for couple of weeks. I managed to find a solution to the problem
but I am not sure whether it was a bug of the connector/spark or not.
I had three tables in Cassandra (Running Cassandra on 5 node
Thank for your reply! Specifically, in our developing environment, I want to
know is there any solution to let every tak do just once sub-matrices
multiplication?
After 'groupBy', every node with 16 cores should get 16 pair matrices, thus
I set every node with 16 tasks to hope every core do once
Hi there
What is an optimal cluster setup for spark? Given X amount of resources,
would you favour more worker nodes with less resources or less worker node
with more resources. Is this application dependent? If so what are the
things to consider, what are good practices?
Cheers
--
View this
Do you have a newline or some other strange character in an argument you
pass to spark that includes that 61608?
On Wed, Sep 24, 2014 at 4:11 AM, jishnu.prat...@wipro.com wrote:
Hi ,
*I am getting this weird error while starting Worker. *
-bash-4.1$ spark-class
I was able to read RC files with the following line:
val file: RDD[(LongWritable, BytesRefArrayWritable)] =
sc.hadoopFile(hdfs://day=2014-08-10/hour=00/,
classOf[RCFileInputFormat[LongWritable, BytesRefArrayWritable]],
classOf[LongWritable], classOf[BytesRefArrayWritable],500)
Try with
See the inline response.
On Wed, Sep 24, 2014 at 4:05 PM, Reddy Raja areddyr...@gmail.com wrote:
Given this program.. I have the following queries..
val sparkConf = new SparkConf().setAppName(NetworkWordCount)
sparkConf.set(spark.master, local[2])
val ssc = new
The part that works is the commented out, single receiver stream below the
loop. It seems that when I call KafkaUtils.createStream more than once, I
don’t receive any messages.
I’ll dig through the logs, but at first glance yesterday I didn’t see anything
suspect. I’ll have to look closer.
Hello,
sc.textFile and so on support wildcards in their path, but apparently
sqlc.parquetFile() does not. I always receive “File
/file/to/path/*/input.parquet does not exist. Is this normal or a bug? Is
there are a workaround?
Thanks
- Marius
Yes, this works. Make sure you have HADOOP_CONF_DIR set on your Spark machines
mn
On Sep 24, 2014, at 5:35 AM, Petr Novak oss.mli...@gmail.com wrote:
Hello,
if our Hadoop cluster is configured with HA and fs.defaultFS points to a
namespace instead of a namenode hostname -
What is the proper way to specify java options for the Spark executors
using spark-submit? We had done this previously using
export SPARK_JAVA_OPTS='..
previously, for example to attach a debugger to each executor or add
-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
On spark-submit I
Can we iterate over RDD of Iterable[String]? How do we do that?
Because the entire Iterable[String] seems to be a single element in the RDD.
Thank You
Hi Arun!
I think you can find info at
https://spark.apache.org/docs/latest/configuration.html
quote:
Spark provides three locations to configure the system:
* Spark properties
https://spark.apache.org/docs/latest/configuration.html#spark-propertiescontrol
most application
Hi Team,
I'm getting below exception. Could you please me to resolve this issue.
Below is my piece of code
val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
var s =rdd.map(x
One of my big spark program always get stuck at 99% where a few tasks never
finishes.
I debugged it by printing out thread stacktraces, and found there're
workers stuck at parquet.hadoop.ParquetFileReader.readNextRowGroup.
Anyone had similar problem? I'm using Spark 1.1.0 built for HDP2.1. The
bq. at com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$
anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179)
Can you reveal what HbaseRDDBatch.scala does ?
Cheers
On Wed, Sep 24, 2014 at 8:46 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
One of my big spark program
(Forgot to add the subject, so again...)
One of my big spark program always get stuck at 99% where a few tasks never
finishes.
I debugged it by printing out thread stacktraces, and found there're
workers stuck at parquet.hadoop.ParquetFileReader.readNextRowGroup.
Anyone had similar problem? I'm
Hi Ted,
It converts RDD[Edge] to HBase rowkey and columns and insert them to HBase
(in batch).
BTW, I found batched Put actually faster than generating HFiles...
Jianshi
On Wed, Sep 24, 2014 at 11:49 PM, Ted Yu yuzhih...@gmail.com wrote:
bq. at
Just a shot in the dark: have you checked region server (logs) to see if
region server had trouble keeping up ?
Cheers
On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi Ted,
It converts RDD[Edge] to HBase rowkey and columns and insert them to HBase
(in batch).
HBase regionserver needs to be balancedyou might have some skewness in
row keys and one regionserver is under pressuretry finding that key and
replicate it using random salt
On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi Ted,
It converts RDD[Edge] to
I was thinking along the same line.
Jianshi:
See
http://hbase.apache.org/book.html#d0e6369
On Wed, Sep 24, 2014 at 8:56 AM, Debasish Das debasish.da...@gmail.com
wrote:
HBase regionserver needs to be balancedyou might have some skewness in
row keys and one regionserver is under
The worker is not writing to HBase, it's just stuck. One task usually
finishes around 20~40 mins, and I waited 6 hours for the buggy ones before
killed it.
There're only 6 out of 3000 tasks got stuck
Jianshi
On Wed, Sep 24, 2014 at 11:55 PM, Ted Yu yuzhih...@gmail.com wrote:
Just a shot in
Hi Debasish,
Tables are presplitted and balanced, rows have skews but is not that
serious. I checked HBase Master UI, all region servers are idle (0 request)
and the table is not under any compaction.
Jianshi
On Wed, Sep 24, 2014 at 11:56 PM, Debasish Das debasish.da...@gmail.com
wrote:
HBase
Hi everyone,
I'm trying to understand the windowed operations functioning. What I want to
achieve is the following:
val ssc = new StreamingContext(sc, Seconds(1))
val lines = ssc.socketTextStream(localhost, )
val window5 = lines.window(Seconds(5),Seconds(5)).reduce((time1:
Hi Ted,
See my previous reply to Debasish, all region servers are idle. I don't
think it's caused by hotspotting.
Besides, only 6 out of 3000 tasks were stuck, and their inputs are about
only 80MB each.
Jianshi
On Wed, Sep 24, 2014 at 11:58 PM, Ted Yu yuzhih...@gmail.com wrote:
I was
Adding a subject.
bq. at parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(
ParquetFileReader.java:599)
Looks like there might be some issue reading the Parquet file.
Cheers
On Wed, Sep 24, 2014 at 9:10 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi Ted,
See my
I tried newAPIHadoopFile and it works except that my original InputFormat
extends InputFormatText,Text and has a RecordReaderText,Text
This throws a not Serializable exception on Text - changing the type to
InputFormatStringBuffer, StringBuffer works with minor code changes.
I do not, however,
Hi Team,
Could you please point me the example program for Spark HBase to read
columns and values
Regards,
Rajesh
spark SQL reads parquet file fine...did you follow one of these to
read/write parquet from spark ?
http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/
On Wed, Sep 24, 2014 at 9:29 AM, Ted Yu yuzhih...@gmail.com wrote:
Adding a subject.
bq. at parquet.hadoop.ParquetFileReader$
Take a look at the following under examples:
examples/src//main/python/hbase_inputformat.py
examples/src//main/python/hbase_outputformat.py
examples/src//main/scala/org/apache/spark/examples/HBaseTest.scala
examples/src//main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala
Hi Ted,
Thank you for quick response and details. I have verified HBaseTest.scala
program it is returning hbaseRDD but i'm not able retrieve values from
hbaseRDD.
Could you help me to retrieve the values from hbaseRDD
Regards,
Rajesh
On Wed, Sep 24, 2014 at 10:12 PM, Ted Yu
I use newAPIHadoopRDD with AccumuloInputFormat. It produces a PairRDD using
Accumulo's Key and Value classes, both of which extend Writable. Works like
a charm. I use the same InputFormat for all my MR jobs.
-Russ
On Wed, Sep 24, 2014 at 9:33 AM, Steve Lewis lordjoe2...@gmail.com wrote:
I
See the scaladoc for how to define an implicit ordering to use with sortByKey:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions
Off the top of my head, I think this is 90% correct to order by age for example:
implicit val studentOrdering:
This behavior is inherited from the parquet input format that we use. You
could list the files manually and pass them as a comma separated list.
On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier mps@gmail.com wrote:
Hello,
sc.textFile and so on support wildcards in their path, but
Thank you, that works!
On 24.09.2014, at 19:01, Michael Armbrust mich...@databricks.com wrote:
This behavior is inherited from the parquet input format that we use. You
could list the files manually and pass them as a comma separated list.
On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier
Does it make sense for us to open a JIRA to track enhancing the Parquet
input format to support wildcards? Or is this something outside of Spark's
control?
Nick
On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust mich...@databricks.com
wrote:
This behavior is inherited from the parquet input
Hi Deep,
The Iterable trait in scala has methods like map and reduce that you can
use to iterate elements of Iterable[String]. You can also create an
Iterator from the Iterable. For example, suppose you have
val rdd: RDD[Iterable[String]]
you can do
rdd.map { x = //x has type Iterable[String]
Maybe differences between JavaPairDStream and JavaPairReceiverInputDStream?
On Wed, Sep 24, 2014 at 7:46 AM, Matt Narrell matt.narr...@gmail.com wrote:
The part that works is the commented out, single receiver stream below the
loop. It seems that when I call KafkaUtils.createStream more than
Do your custom Writable classes implement Serializable - I think that is
the only real issue - my code uses vanilla Text
No, they do not implement Serializable. There are a couple of places where
I've had to do a Text-String conversion but generally it hasn't been a
problem.
-Russ
On Wed, Sep 24, 2014 at 10:27 AM, Steve Lewis lordjoe2...@gmail.com wrote:
Do your custom Writable classes implement Serializable - I
Hi,
Why does RDD.saveAsObjectFile() to S3 leave a bunch of *_$folder$ empty
files around? Is it possible for the saveas to clean up?
tks
Do you have YARN_CONF_DIR set in your environment to point Spark to where your
yarn configs are?
Greg
From: Raghuveer Chanda
raghuveer.cha...@gmail.commailto:raghuveer.cha...@gmail.com
Date: Wednesday, September 24, 2014 12:25 PM
To:
This just shows the driver. Click the Executors tab in the Spark UI
mn
On Sep 24, 2014, at 11:25 AM, Raghuveer Chanda raghuveer.cha...@gmail.com
wrote:
Hi,
I'm new to spark and facing problem with running a job in cluster using YARN.
Initially i ran jobs using spark master as --master
Hmmm - I have only tested in local mode but I got an
java.io.NotSerializableException: org.apache.hadoop.io.Text
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at
Hi,
I was able to get the training running in local mode with default settings,
there was a problem with document labels which were quite large(not 20 as
suggested earlier).
I am currently training 175000 documents on a single node with 2GB of
executor memory and 5GB of driver memory
You'll need to look at the driver output to have a better idea of
what's going on. You can use yarn logs --applicationId blah after
your app is finished (e.g. by killing it) to look at it.
My guess is that your cluster doesn't have enough resources available
to service the container request
Thanks for the reply, I have doubt as to which path to set for
YARN_CONF_DIR
My /etc/hadoop folder has the following sub folders
conf conf.cloudera.hdfs conf.cloudera.mapreduce conf.cloudera.yarn
and both conf and conf.cloudera.yarn folders have yarn-site.xml. As of now
I set the variable as
The screenshot executors.8080.png is of the executors tab itself and only
driver is added without workers even if I kept the master as yarn-cluster.
On Wed, Sep 24, 2014 at 11:18 PM, Matt Narrell matt.narr...@gmail.com
wrote:
This just shows the driver. Click the Executors tab in the Spark UI
Really? What should we make of this?
24 Sep 2014 10:03:36,772 ERROR [Executor task launch worker-52] Executor -
Exception in task ID 40599
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:789)
at
Never mind: https://issues.apache.org/jira/browse/SPARK-1476
On Wed, Sep 24, 2014 at 11:10 AM, Victor Tso-Guillen v...@paxata.com
wrote:
Really? What should we make of this?
24 Sep 2014 10:03:36,772 ERROR [Executor task launch worker-52] Executor -
Exception in task ID 40599
Does this work with spark-sql in 1.0.1 too ? I tried like this
sqlContext.sql(SET spark.sql.autoBroadcastJoinThreshold=1;)
But it still seems to trigger shuffleMapTask() and amount of shuffle is
same with / without this parameter...
Kindly request some help here
Thanks
--
View
We could certainly do this. The comma separated support is something I
added.
On Wed, Sep 24, 2014 at 10:20 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Does it make sense for us to open a JIRA to track enhancing the Parquet
input format to support wildcards? Or is this something
It's really Hadoop's support for S3. Hadoop FS semantics need
directories, and S3 doesn't have a proper notion of directories.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/s3native/NativeS3FileSystem.html
You should leave them AFAIK.
On Wed, Sep 24, 2014 at 6:41 PM, Shay Seng
Thanks for the reply .. This is the error in the logs obtained from UI at
http://dml3:8042/node/containerlogs/container_1411578463780_0001_02_01/chanda
So now how to set the Log Server url ..
Failed while trying to construct the redirect url to the log server. Log
Server url may not be
I'm afraid SparkSQL isn't an option for my use case, so I need to use the
Spark API itself.
I turned off Kryo, and I'm getting a NullPointerException now:
scala val ref = file.take(1)(0)._2
ref: org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable =
You need to use the command line yarn application that I mentioned
(yarn logs). You can't look at the logs through the UI after the app
stops.
On Wed, Sep 24, 2014 at 11:16 AM, Raghuveer Chanda
raghuveer.cha...@gmail.com wrote:
Thanks for the reply .. This is the error in the logs obtained from
Hi all,
I have just updated to spark 1.1.0. The new row representation of the data
in spark SQL is very handy.
I have noticed that it does not set None for NULL values coming from Hive if
the column was string type - seems it works with other types.
Is that something that will be implemented?
Could create a JIRA and add test cases for it? Thanks!
Davies
On Wed, Sep 24, 2014 at 11:56 AM, jamborta jambo...@gmail.com wrote:
Hi all,
I have just updated to spark 1.1.0. The new row representation of the data
in spark SQL is very handy.
I have noticed that it does not set None for
Yeah I got the logs and its reporting about the memory.
14/09/25 00:08:26 WARN YarnClusterScheduler: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered
and have sufficient memory
Now I shifted to big cluster with more memory but here im not
I used the following code as an example to deserialize
BytesRefArrayWritable.
http://www.massapi.com/source/hive-0.5.0-dev/src/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java.html
Best Regards,
Cem.
On Wed, Sep 24, 2014 at 1:34 PM, Pramod Biligiri pramodbilig...@gmail.com
wrote:
Hey,
I actually have 2 question
(1) I want to generate unique IDs for each RDD element and I want to
assign them in parallel so I do
rdd.mapPartitionsWithIndex((index, s) = {
var count = 0L
s.zipWithIndex.map {
case (t, i) = {
count += 1
(index *
Dear Spark Users,
I'd highly appreciate any comments on whether there is a way to define an
Avro table and then run select * without errors. Thank you very much in
advance!
Ilya.
Attached please find a sample data directory and the Avro schema.
analytics_events2.zip
Hi,
testing out the Spark Ec2 deployment plugin:
I try to compile using
$sbt sparkLaunchCluster
---
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[warn]
Hello,
I'm learning about Spark Streaming and I'm really excited.
Today I was testing to package some apps and send them in a Standalone
cluster in my computer locally.
It occurred ok.
So,
I created one Virtual Machine with network bridge and I tried to send again
the app to this VM from my local
I solved this question using the SBT plugin sbt-assembly.
It's very good!
Bye.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Twitter-Example-Error-tp12600p15073.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thank you for the suggestion!
Bye.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-Network-Intensive-Workload-tp8501p15074.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
One more information..
When I submit the application from my local PC to my VM,
This VM was the master and worker and my local PC wasn't part of the
cluster.
Thanks.
--
View this message in context:
I am attempting to use Spark Streaming to summarize event data streaming in
and save it to a MySQL table. The input source data is stored in 4 topics on
Kafka with each topic having 12 partitions. My approach to doing this worked
both in local development and a simulated load testing environment
Archit, Are you able to get it to work with 1.0.0?
I tried the --files suggestion from Marcelo and it just changed logging for my
client and the appmaster and executors were still the same.
~Vipul
On Jul 30, 2014, at 9:59 PM, Archit Thakur archit279tha...@gmail.com wrote:
Hi Marcelo,
Oh I should add I've tried a range of batch durations and reduce by window
durations to no effect. I'm not too sure how to choose these?
br/br/
Currently today I've been testing with batch duration of 1 minute - 10
minute and reduce window duration of 10 minute or 20 minutes.
--
View this
Another update, actually it just hit me my problem is probably right here:
https://gist.github.com/maddenpj/74a4c8ce372888ade92d#file-gistfile1-scala-L22
I'm creating a JDBC connection on every record, that's probably whats
killing the performance. I assume the fix is just broadcast the
Setting spark.mesos.coarse=true helped reduce the time on the mesos cluster
from 17 min to around 6 min. The scheduler delay per task reduced from 40
ms to around 10 ms.
thanks
On Mon, Sep 22, 2014 at 12:36 PM, Xiangrui Meng men...@gmail.com wrote:
1) MLlib 1.1 should be faster than 1.0 in
Note, the data is random numbers (double).
Any suggestions/pointers will be highly appreciated.
Thanks,
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15083.html
Sent from the
Hi All,
When I try to load dataset using MLUtils.loadLibSVMFile, I have the following
problem. Any help will be greatly appreciated!
Code snippet:
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.rdd.RDD
import
I've been seeing a failure in JettyUtils.createStaticHandler when running Spark
1.1 programs from Eclipse which I hadn't seen before. The problem seems to be
line 129
Option(Utils.getSparkClassLoader.getResource(resourceBase)) match { ...
getSparkClassLoader returns NULL, and it's all
If you launched the job in yarn-cluster mode, the tracking URL is
printed on the output of the launched process. That will lead you to
the Spark UI once the job is running.
If you're using CM, you can reach the same link by clicking on the
Resource Manager UI link on your Yarn service, then
Sounds like spark-01 is not resolving correctly on your machine (or
is the wrong address). Can you ping spark-01 and does that reach the
VM where you set up the Spark Master?
On Wed, Sep 24, 2014 at 1:12 PM, danilopds danilob...@gmail.com wrote:
Hello,
I'm learning about Spark Streaming and I'm
Hi Sameer,
This seems to be a file format issue. Can you make sure that your data
satisfies the format?
Each line of libsvm format is as follows:
label index1:value1 index2:value2 ...
Thanks,
Liquan
On Wed, Sep 24, 2014 at 3:02 PM, Sameer Tilak ssti...@live.com wrote:
Hi All,
When I
You only need to define an ordering of student, no need to modify the class
definition of student. It's like a Comparator class in java.
Currently, you have to map the rdd to sort by value.
Liquan
On Wed, Sep 24, 2014 at 9:52 AM, Sean Owen so...@cloudera.com wrote:
See the scaladoc for how to
Hi Manish!
Thanks for the reply and the explication on why the branches won't compile
-- that makes perfect sense.
You mentioned making the branch compatible with the latest master; could
you share some more details? Which branch do you mean -- is it on your
GitHub? And would I just be able to
Χαίρε Αδαμάντιε Κοραήέαν είναι πράγματι το όνομα σου..
Just to follow up on Liquan, you might be interested in removing the
thresholds, and then treating the predictions as a probability from 0..1
inclusive. SVM with the linear kernel is a straightforward linear
classifier -- so you with the
1 - 100 of 119 matches
Mail list logo