This worked. Thanks for the tip Michael.
Thanks,
Muthu
On Thu, Feb 16, 2017 at 12:41 PM, Michael Armbrust
wrote:
> The toString method of Dataset.queryExecution includes the various plans.
> I usually just log that directly.
>
> On Thu, Feb 16, 2017 at 8:26 AM, Muthu
Hi,I'm trying to create application that would programmatically submit jar
file to Spark standalone cluster running on my local PC. However, I'm always
getting the error WARN TaskSetManager:66 - Lost task 1.0 in stage 0.0 (TID
1, 192.168.2.68, executor 0): java.lang.RuntimeException: Stream
When you say workers, are you using Spark Streaming? I'm not sure if this
will help, but there is an example of deploying a
RandomForestClassificationModel in Spark Streaming against Kafka that uses
createDataFrame here:
Thanks Hollin.
I will take a look at mleap and will let you know if I have any questions.
Jianhong
From: Hollin Wilkins [mailto:hol...@combust.ml]
Sent: Tuesday, February 14, 2017 11:48 PM
To: Jianhong Xia
Cc: Sumona Routh ; ayan guha
Maybe you can check this PR?
https://github.com/apache/spark/pull/16399
Thanks,
Xiao
2017-02-15 15:05 GMT-08:00 KhajaAsmath Mohammed :
> Hi,
>
> I am using spark temporary tables to write data back to hive. I have seen
> weird behavior of .hive-staging files after
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/checkpoint/11ea8862-122c-4614-bc7e-f761bb57ba23/rdd-347/.part-1-attempt-3
could only be replicated to 0 nodes instead of minReplication (=1). There
are 0 datanode(s) running and no node(s) are excluded in this
Thanks, Sam. I will have a look at it.
On Feb 16, 2017 10:06 PM, "Sam Elamin" wrote:
> I recommend running spark in local mode when your first debugging your
> code just to understand what's happening and step through it, perhaps catch
> a few errors when you first
I recommend running spark in local mode when your first debugging your code
just to understand what's happening and step through it, perhaps catch a
few errors when you first start off
I personally use intellij because it's my preference You can follow this
guide.
Hi,
I was looking for some URLs/documents for getting started on debugging
Spark applications.
I prefer developing Spark applications with Scala on Eclipse and then
package the application jar before submitting.
Kind regards,
Reza
The toString method of Dataset.queryExecution includes the various plans.
I usually just log that directly.
On Thu, Feb 16, 2017 at 8:26 AM, Muthu Jayakumar wrote:
> Hello there,
>
> I am trying to write to log-line a dataframe/dataset queryExecution and/or
> its logical
I'm getting errors when I try to run my docker container in bridge networking
mode on mesos.
Here is my spark submit script
/spark/bin/spark-submit \
--class com.package.MySparkJob \
--name My-Spark-Job \
--files /path/config.cfg, ${JAR} \
--master ${SPARK_MASTER_HOST} \
--deploy-mode
Hi
I am trying to do topic modeling in Spark using Spark's LDA package. Using
Spark 2.0.2 and pyspark API.
I ran the code as below:
*from pyspark.ml.clustering import LDA*
*lda = LDA(featuresCol="tf_features",k=10, seed=1, optimizer="online")*
*ldaModel=lda.fit(tf_df)*
Dear spark users,
Is there any mechanism in Spark that does not guarantee the idempotent
nature? For example, for stranglers, the framework might start another task
assuming the strangler is slow while the strangler is still running. This
would be annoying sometime when say the task is writing to
Hi,
Thanks for your kind response. The hash key using random numbers increases
the time for processing the data. My entire join for the entire month
finishes within 150 seconds for 471 million records and then stays for
another 6 mins for 55 million records.
Using hash keys increases the
Hello,
hiveSqlContext.sql(scala.io.Source.fromFile(args(0).toString()).mkString).collect()
I have a file in my local system
and i am spark-submit deploy mode cluster on hadoop
so args(0) should be on hadoop cluster or local?
what should be the protocol file:///
for hadoop what is the
You can also so something similar to what is mentioned in [1].
The basic idea is to use two hash functions for each key and assigning it
to the least loaded of the two hashed worker.
Cheers,
Anis
[1].
Yes. You have to change your key, or as BigData term, "adding salt".
Yong
From: Gourav Sengupta
Sent: Thursday, February 16, 2017 11:11 AM
To: user
Subject: skewed data in join
Hi,
Is there a way to do multiple reducers for joining
Hello there,
I am trying to write to log-line a dataframe/dataset queryExecution and/or
its logical plan. The current code...
def explain(extended: Boolean): Unit = {
val explain = ExplainCommand(queryExecution.logical, extended = extended)
Hi,
Is there a way to do multiple reducers for joining on skewed data?
Regards,
Gourav
My problem is quite simple - JVM is running out of memory during model =
dt.fit(train_small). My train_small dataset contains only 100 rows(I have
limited the number of rows to make sure the size of dataset doesn't cause
the memory overflow). But each row has a column all_features with a long
Thanks That worked for me previously I was using wrong join .that the
reason it did Not worked for me
Tbanks
On Feb 16, 2017 01:20, "Sam Elamin" wrote:
> You can do a join or a union to combine all the dataframes to one fat
> dataframe
>
> or do a select on the columns
21 matches
Mail list logo