u've asked total cores to be 2 + 1 for driver(since you are running in
cluster mode, so it's running on one of the slaves)
change total cores to be 3*2
change submit mode to be client - you'll have full utilization
(btw it's not advisable to use all cores of slave...since there is OS
processes and
try to assemble log4j.xml or log4j.properties in your jar...probably you'll
get what you want, however pay attention that when you'll move to multinode
cluster - there will be difference
On 20 November 2015 at 05:10, Afshartous, Nick
wrote:
>
> < log4j.properties file
Hello,
I've just wanted to use sc._jsc.hadoopConfiguration().set('key','value') in
pyspark 1.5.2 but I got set method not exists error.
Are there anyone who know a workaround to set some hdfs related properties
like dfs.blocksize?
Thanks in advance!
Tamas
Hi Todd,
Could you please provide an example of doing this. Mazerunner seems to be doing
something similar with Neo4j but it goes via hdfs and updates only the graph
properties. Is there a direct way to do this with Neo4j or Titan?
Regards,
Ashish
From: SLiZn Liu
Hi,
I have Spark application which contains the following segment:
val reparitioned = rdd.repartition(16)
val filtered: RDD[(MyKey, myData)] = MyUtils.filter(reparitioned,
startDate, endDate)
val mapped: RDD[(DateTime, myData)] =
filtered.map(kv=(kv._1.processingTime, kv._2))
val reduced:
Hi everyone,
I was wondering if there is a better way to drop mutliple columns from a
dataframe or why there is no drop(cols: Column*) method in the dataframe
API.
Indeed, I tend to write code like this:
val filteredDF = df.drop("colA")
.drop("colB")
.drop("colC")
//etc
which is a
I'm running into all kinds of problems with Spark 1.5.1 -- does anyone have
a version that's working smoothly for them?
On Fri, Nov 20, 2015 at 10:50 AM, Dean Wampler
wrote:
> I didn't expect that to fail. I would call it a bug for sure, since it's
> practically useless
Hi All,
If write ahead logs are enabled in spark streaming does all the received
data gets written to HDFS path ? or it only writes the metadata.
How does clean up works , does HDFS path gets bigger and bigger up everyday
do I need to write an clean up job to delete data from write ahead logs
Mind trying 1.5.2 release ?
Thanks
On Fri, Nov 20, 2015 at 10:56 AM, Walrus theCat
wrote:
> I'm running into all kinds of problems with Spark 1.5.1 -- does anyone
> have a version that's working smoothly for them?
>
> On Fri, Nov 20, 2015 at 10:50 AM, Dean Wampler
I think my problem persists whether I use Kafka or sockets. Or am I wrong?
How would you use Kafka here?
On Fri, Nov 20, 2015 at 7:12 PM, Christian wrote:
> Have you considered using Kafka?
>
> On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa wrote:
>
>>
Hello.
I'm seeing an error creating a Hive Context moving from Spark 1.4.1 to
1.5.2. Has anyone seen this issue?
I'm invoking the following:
new HiveContext(sc) // sc is a Spark Context
I am seeing the following error:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding
The 1.5.2 Spark was compiled using the following options: mvn
-Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive
-Phive-thriftserver clean package
Regards,
Bryan Jeffrey
On Fri, Nov 20, 2015 at 2:13 PM, Bryan Jeffrey
wrote:
> Hello.
>
> I'm seeing an error
You're confused about which parts of your code are running on the driver vs
the executor, which is why you're getting serialization errors.
Read
http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
On Fri, Nov 20, 2015 at 1:07 PM, Saiph
Nevermind. I had a library dependency that still had the old Spark version.
On Fri, Nov 20, 2015 at 2:14 PM, Bryan Jeffrey
wrote:
> The 1.5.2 Spark was compiled using the following options: mvn
> -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive
>
Dan,
Even though you may be adding more nodes to the cluster, the Spark application
has to be requesting additional executors in order to thus use the added
resources. Or the Spark application can be using Dynamic Resource Allocation
Suraj,
Spark History server is running on 18080
(http://spark.apache.org/docs/latest/monitoring.html) which is not going to
give you are real-time update on a running Spark application. Given this is
Spark on YARN, you will need to view the Spark UI from the Application Master
URL which can
1- Is there any wat=y to either make the pair of RDDs from a Dstream-
Dstream ---> Dstream
so that i can use already defined corelation function in spark.
*Aim is to find auto-corelation value in spark .(As per my knowledge spark
streaming does not support this.)*
--
Thanks &
Hey everyone,
I have a Hive table that has a lot of small parquet files and I am creating
a data frame out of it to do some processing, but since I have a large
number of splits/files my job creates a lot of tasks, which I don't want.
Basically what I want is the same functionality that Hive
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: 20 November 2015 21:14
To: u...@hive.apache.org
Subject: starting spark-shell throws /tmp/hive on HDFS should be writable
error
Hi,
Has this been resolved. I don't think this has anything to do with /tmp/hive
directory permission
Hi,
I have been running the spark job on standalone spark cluster. I wants to
kill the spark job using Java API. I am having the spark job name and spark
job id.
The REST POST call for killing the job is not working.
If anyone explored it please help me out.
--
Thanks and Regards,
Hokam Singh
Interesting, SPARK-3090 installs shutdown hook for stopping SparkContext.
FYI
On Fri, Nov 20, 2015 at 7:12 PM, Stéphane Verlet
wrote:
> I solved the first issue by adding a shutdown hook in my code. The
> shutdown hook get call when you exit your script (ctrl-C ,
I tried adding shutdown hook to my code but it didn't help. Still same issue
On Fri, Nov 20, 2015 at 7:08 PM, Ted Yu wrote:
> Which Spark release are you using ?
>
> Can you pastebin the stack trace of the process running on your machine ?
>
> Thanks
>
> On Nov 20, 2015,
Spark 1.4.1
On Friday, November 20, 2015, Ted Yu wrote:
> Which Spark release are you using ?
>
> Can you pastebin the stack trace of the process running on your machine ?
>
> Thanks
>
> On Nov 20, 2015, at 6:46 PM, Vikram Kone
I am not sure , I think it has to do with the signal sent to the process
and how the JVM handles it
Ctrl-C sends a a SIGINT vs a TERM signal for the kill command
On Fri, Nov 20, 2015 at 8:21 PM, Vikram Kone wrote:
> Thanks for the info Stephane.
> Why does CTRL-C in the
I do this in my stop script to kill the application: kill -s SIGTERM `pgrep
-f StreamingApp`
to stop it forcefully : pkill -9 -f "StreamingApp"
StreamingApp is name of class which I submitted.
I also have shutdown hook thread to stop it gracefully.
sys.ShutdownHookThread {
logInfo("Gracefully
Hi,
I am exploring the MLlib. I have taken the examples of the MLlib and tried
to train a SVM Model. I am getting the exception when i am saving the
trained model.As i run the code in local mode it works fine, but when i run
the MLlib example in standalone cluster mode it fails to save the Model.
I solved the first issue by adding a shutdown hook in my code. The shutdown
hook get call when you exit your script (ctrl-C , kill … but nor kill -9)
val shutdownHook = scala.sys.addShutdownHook {
try {
sparkContext.stop()
//Make sure to kill any other threads or thread pool you may be
Thanks for the info Stephane.
Why does CTRL-C in the terminal running spark-submit kills the app in spark
master correctly w/o any explicit shutdown hooks in the code? Can you
explain why we need to add the shutdown hook to kill it when launched via a
shell script ?
For the second issue, I'm not
Which Spark release are you using ?
Can you pastebin the stack trace of the process running on your machine ?
Thanks
> On Nov 20, 2015, at 6:46 PM, Vikram Kone wrote:
>
> Hi,
> I'm seeing a strange problem. I have a spark cluster in standalone mode. I
> submit spark
29 matches
Mail list logo