Some inputs will be really helpful.
Thanks,
-Vibhor
On Fri, May 30, 2014 at 7:51 PM, Vibhor Banga vibhorba...@gmail.com wrote:
Hi all,
I am planning to use spark with HBase, where I generate RDD by reading
data from HBase Table.
I want to know that in the case when the size of HBase
Clearly thr will be impact on performance but frankly depends on what you
are trying to achieve with the dataset.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Sat, May 31, 2014 at 11:45 AM, Vibhor Banga
You can increase your akka timeout, should give you some more life.. are
you running out of memory by any chance?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Sat, May 31, 2014 at 6:52 AM, Michael Chang
The documentation you looked at is not official, though it is from
@pwendell's website. It was for the Spark SQL release. Please find the
official documentation here:
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm
It contains a working example
Hi Tobias,
One hack you can try is:
rdd.mapPartitions(iter = {
val x = new X()
iter.map(row = x.doSomethingWith(row)) ++ { x.shutdown(); Iterator.empty }
})
Best,
Xiangrui
On Thu, May 29, 2014 at 11:38 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
I want to use an object x in my RDD
Hi,
scenario : Read data from HDFS and apply hive query on it and the result
is written back to HDFS.
Scheme creation, Querying and saveAsTextFile are working fine with
following mode
- local mode
- mesos cluster with single node
- spark cluster with multi node
Schema creation and
Hi there, Patrick. Thanks for the reply...
It wouldn't surprise me that AWS Ubuntu has Python 2.7. Ubuntu is cool like
that. :-)
Alas, the Amazon Linux AMI (2014.03.1) does not, and it's the very first
one on the recommended instance list. (Ubuntu is #4, after Amazon, RedHat,
SUSE) So, users
Can you look at the logs from the executor or in the UI? They should
give an exception with the reason for the task failure. Also in the
future, for this type of e-mail please only e-mail the user@ list
and not both lists.
- Patrick
On Sat, May 31, 2014 at 3:22 AM, prabeesh k
Hey There,
You can remove an accumulator by just letting it go out of scope and
it will be garbage collected. For broadcast variables we actually
store extra information for it, so we provide hooks for users to
remove the associated state. There is no such need for accumulators,
though.
-
Currently, an executor is always run in it's own JVM, so it should be
possible to just use some static initialization to e.g. launch a
sub-process and set up a bridge with which to communicate.
This is would be a fairly advanced use case, however.
- Patrick
On Thu, May 29, 2014 at 8:39 PM,
1. ctx is an instance of JavaSQLContext but the textFile method is called as
a member of ctx.
According to the API JavaSQLContext does not have such a member, so im
guessing this should be sc instead.
Yeah, I think you are correct.
2. In that same code example the object sqlCtx is
1) Is there a guarantee that a partition will only be processed on a node
which is in the getPreferredLocations set of nodes returned by the RDD ?
No there isn't, by default Spark may schedule in a non preferred
location after `spark.locality.wait` has expired.
I think there are a few ways to do this... the simplest one might be to
manually build a set of comma-separated paths that excludes the bad file,
and pass that to textFile().
When you call textFile() under the hood it is going to pass your filename
string to hadoopFile() which calls
What instance types did you launch on?
Sometimes you also get a bad individual machine from EC2. It might help to
remove the node it’s complaining about from the conf/slaves file.
Matei
On May 30, 2014, at 11:18 AM, PJ$ p...@chickenandwaffl.es wrote:
Hey Folks,
I'm really having quite a
That's a neat idea. I'll try that out.
On Sat, May 31, 2014 at 2:45 PM, Patrick Wendell pwend...@gmail.com wrote:
I think there are a few ways to do this... the simplest one might be to
manually build a set of comma-separated paths that excludes the bad file,
and pass that to textFile().
I'm running the following code to load an entire directory of Avros using
hadoopRDD.
val input = hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/*
// Setup the path for the job vai a Hadoop JobConf
val jobConf= new JobConf(sc.hadoopConfiguration)
jobConf.setJobName(Test Scala Job)
hi, all
i launch a spark cluster on ec2 with spark version v1.0.0-rc3, everything
goes well except that i
can not access application details on the web ui, i just click on the
application name, but there's
not response, has anyone met this before? is this a bug?
thanks!
--
View this
Hi all,
I tried a couple ways, but couldn't get it to work..
The following seems to be what the online document (
http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting:
SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
Yep, I just issued a pull request.
Yadid
On 5/31/14, 1:25 PM, Patrick Wendell wrote:
1. ctx is an instance of JavaSQLContext but the textFile method is called as
a member of ctx.
According to the API JavaSQLContext does not have such a member, so im
guessing this should be sc instead.
Yeah,
Hi,
I am trying to run an example on AMAZON EC2 and have successfully
set up one cluster with two nodes on EC2. However, when I was testing an
example using the following command,
*
./run-example org.apache.spark.examples.GroupByTest spark://`hostname`:7077*
I got the following
It's been another day of spinning up dead clusters...
I thought I'd finally worked out what everyone else knew - don't use the
default AMI - but I've now run through all of the official quick-start
linux releases and I'm none the wiser:
Amazon Linux AMI 2014.03.1 - ami-7aba833f (64-bit)
21 matches
Mail list logo