Hi,
I'm not completely sure about this either, but this is what we are doing
currently:
Configure your logging to write to STDOUT, not to a file explicitely. Spark
will capture stdour and stderr and separate the messages into a app/driver
folder structure in the configured worker directory.
We
Make sure if you are using 127.0.0.1 please check in /etc/hosts and uncheck
or create 127.0.1.1 named it as localhost
On Sat, Mar 21, 2015 at 9:57 AM, Ted Yu yuzhih...@gmail.com wrote:
bq. Caused by: java.net.UnknownHostException: dhcp-10-35-14-100: Name or
service not known
Can you check
Hi Sean,
It's getting strange now. If I ran from IDE, my executor memory is always
set to 6.7G, no matter what value I set in code. I have check my
environment variable, and there's no value of 6.7, or 12.5
Any idea?
Thanks,
David
On Tue, 17 Mar 2015 00:35 null jishnu.prat...@wipro.com wrote:
No, I didn't mean local cluster. I mean run in local, like in IDE.
On Mon, 16 Mar 2015 23:12 xu Peng hsxup...@gmail.com wrote:
Hi David,
You can try the local-cluster.
the number in local-cluster[2,2,1024] represents that there are 2 worker,
2 cores and 1024M
Best Regards
Peng Xu
Mike:
Once hadoop 2.7.0 is released, you should be able to enjoy the enhanced
performance of s3a.
See HADOOP-11571
Cheers
On Sat, Mar 21, 2015 at 8:09 AM, Chris Fregly ch...@fregly.com wrote:
hey mike!
you'll definitely want to increase your parallelism by adding more shards
to the stream -
hey mike!
you'll definitely want to increase your parallelism by adding more shards to
the stream - as well as spinning up 1 receiver per shard and unioning all the
shards per the KinesisWordCount example that is included with the kinesis
streaming package.
you'll need more cores (cluster) or
Hi,
I wonder if someone can help suggest a solution to my problem, I had a simple
process working using Strings and now
want to convert to RDD[Char], the problem is when I end up with a nested call
as follow:
1) Load a text file into an RDD[Char]
val inputRDD =
Hey Eason!
Weird problem indeed. More information will probably help to find te issue:
Have you searched the logs for peculiar messages?
How does your Spark environment look like? #workers, #threads, etc?
Does it work if you create separate receivers for the topics?
Regards,
Jeff
2015-03-21
If you are running from your IDE, then I don't know what you are
running or in what mode. The discussion here concerns using standard
mechanisms like spark-submit to configure executor memory. Please try
these first instead of trying to directly invoke Spark, which will
require more understanding
Hi,
Apologies for the generic question.
As I am developing predictive models for the first time and soon model will
be deployed in production very soon.
Could somebody help me with the model deployment in production , I have
read quite a few on model deployment and have read some books on
bq. Requesting 1 new executor(s) because tasks are backlogged
1 executor was requested.
Which hadoop release are you using ?
Can you check resource manager log to see if there is some clue ?
Thanks
On Fri, Mar 20, 2015 at 4:17 PM, Manoj Samel manojsamelt...@gmail.com
wrote:
Forgot to add -
Thank you for your help Akhil! We found that it is no longer working from
our laptop to remotely connect to the remote Spark cluster, but it works if
the client is on the remote cluster as well, starting from the version
1.2.0 and beyond (v1.1.1 and below are fine). Not sure if this is related
1. make sure your secret key doesn't have a / in it. If it does, generate a
new key.
2. jets3t and hadoop JAR versions need to be in sync; jets3t 0.9.0 was picked
up in Hadoop 2.4 and not AFAIK
3. Hadoop 2.6 has a new S3 client, s3a, which compatible with s3n data. It
uses the AWS toolkit
I believe that you can get what you want by using HiveQL instead of the
pure programatic API. This is a little verbose so perhaps a specialized
function would also be useful here. I'm not sure I would call it
saveAsExternalTable as there are also external spark sql data source
tables that have
Is there a module in spark streaming that lets you listen to
the alerts/conditions as they happen in the streaming module? Generally
spark streaming components will execute on large set of clusters like hdfs
or Cassandra, however when it comes to alerting you generally can't send it
directly from
I have a couple of data frames that I pulled from SparkSQL and the primary
key of one is a foreign key of the same name in the other. I'd rather not
have to specify each column in the SELECT statement just so that I can
rename this single column.
When I try to join the data frames, I get an
Now, I am not able to directly use my RDD object and have it implicitly
become a DataFrame. It can be used as a DataFrameHolder, of which I could
write:
rdd.toDF.registerTempTable(foo)
The rational here was that we added a lot of methods to DataFrame and made
the implicits more
In the log, I saw
MemoryStorage: MemoryStore started with capacity 6.7GB
But I still can not find where to set this storage capacity.
On Sat, 21 Mar 2015 20:30 Xi Shen davidshe...@gmail.com wrote:
Hi Sean,
It's getting strange now. If I ran from IDE, my executor memory is always
set to
Hi,
I use the *OpenBLAS* DLL, and have configured my application to work in
IDE. When I start my Spark application from IntelliJ IDE, I can see in the
log that the native lib is loaded successfully.
But if I use *spark-submit* to start my application, the native lib still
cannot be load. I saw
Yeah, I think it is harder to troubleshot the properties issues in a IDE.
But the reason I stick to IDE is because if I use spark-submit, the BLAS
native cannot be loaded. May be I should open another thread to discuss
that.
Thanks,
David
On Sun, 22 Mar 2015 10:38 Xi Shen davidshe...@gmail.com
Hi,
Does anyone have concrete recommendations how to reduce Spark's logging
verbosity.
We have attempted on several occasions to address this by setting various
log4j properties, both in configuration property files and in
$SPARK_HOME/conf/ spark-env.sh; however, all of those attempts have
Can you try the --driver-library-path option ?
spark-submit --driver-library-path /opt/hadoop/lib/native ...
Cheers
On Sat, Mar 21, 2015 at 4:58 PM, Xi Shen davidshe...@gmail.com wrote:
Hi,
I use the *OpenBLAS* DLL, and have configured my application to work in
IDE. When I start my Spark
Hello,
I am trying to install Spark 1.3.0 on my mac. Earlier, I was working with
Spark 1.1.0. Now, I come across this error :
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-network-common_2.10;1.3.0: configuration not public
in
bq. the BLAS native cannot be loaded
Have you tried specifying --driver-library-path option ?
Cheers
On Sat, Mar 21, 2015 at 4:42 PM, Xi Shen davidshe...@gmail.com wrote:
Yeah, I think it is harder to troubleshot the properties issues in a IDE.
But the reason I stick to IDE is because if I
Hi Shashidhar,
Our team at PredictionIO is trying to solve the production deployment of
model. We built a powered-by-Spark framework (also certified on Spark by
Databricks) that allows a user to build models with everything available
from the Spark API, persist the model automatically with
Hi,
I have two big RDD, and I need to do some math against each pair of them.
Traditionally, it is like a nested for-loop. But for RDD, it cause a nested
RDD which is prohibited.
Currently, I am collecting one of them, then do a nested for-loop, so to
avoid nested RDD. But would like to know if
26 matches
Mail list logo