Michael,
I don't know what's your environment but if it's Cloudera, you should be able
to see the link to your master in the Hue.
Thanks
On Thursday, January 7, 2016 5:03 PM, Michael Pisula
wrote:
I had tried several parameters, including
Or he can also transform the whole date into a string
On Thursday, January 7, 2016 2:25 PM, Sujit Pal
wrote:
Hi Jorge,
Maybe extract things like dd, mm, day of week, time of day from the datetime
string and use them as features?
-sujit
On Thu, Jan 7, 2016 at
Right .. if you are using github version, just modify the ReceiverLauncher
and add that . I will fix it for Spark 1.6 and release new version in
spark-packages for spark 1.6
Dibyendu
On Thu, Jan 7, 2016 at 4:14 PM, Ted Yu wrote:
> I cloned
HI All,
Currently using Spark 1.4.0 version, I have a requirement to add a column
having Sequential Numbering to an existing DataFrame
I understand Window Function "rowNumber" serves my purpose
hence I have below import statements to include the same
import org.apache.spark.sql.expressions.Window
Ok, enuf! :) Leaving the room for now as I'm like a copycat :)
https://en.wiktionary.org/wiki/enuf
Pozdrawiam,
Jacek
Jacek Laskowski | https://medium.com/@jaceklaskowski/
Mastering Apache Spark
==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at
Some discussion is there in https://github.com/dibbhatt/kafka-spark-consumer
and some is mentioned in https://issues.apache.org/jira/browse/SPARK-11045
Let me know if those answer your question .
In short, Direct Stream is good choice if you need exact once semantics and
message ordering , but
Please take a look at the following for sample on how rowNumber is used:
https://github.com/apache/spark/pull/9050
BTW 1.4.0 was an old release.
Please consider upgrading.
On Thu, Jan 7, 2016 at 3:04 AM, satish chandra j
wrote:
> HI All,
> Currently using Spark 1.4.0
Hi,
Can Spark using HiveContext External Tables read sub-directories?
Example:
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql._
import sqlContext.implicits._
//prepare data and create subdirectories with parquet
val df = Seq("id1" -> 1, "id2" -> 4, "id3"->
Hi,
We have been using spark streaming for a little while now.
Until now, we were running our spark streaming jobs in spark 1.5.1 and it
was working well. Yesterday, we upgraded to spark 1.6.0 without any changes
in the code. But our streaming jobs are not working any more. We are
getting an
Hi, I wanted to try the 1.6.0 version of Spark, but when I run it into my
local machine, it throws me this exception :
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch
dir: /tmp/hive on HDFS should be writable.
Thing is, this problem happened to me in the 1.5.1
Hi thanks for the response. Each Job is processing around 5gb of skewed
data does group by multiple fields and does aggregation and does
coalesce(1) and saves csv file in gzip format. I think coalesce is causing
problem but data is not that huge I don't understand why it keeps on
running for an
I sorted this out. There were 2 different version of derby and ensuring the
metastore and spark used the same version of Derby made the problem go away.
Deenar
On 6 January 2016 at 02:55, Yana Kadiyska wrote:
> Deenar, I have not resolved this issue. Why do you think
According to https://spark.apache.org/docs/latest/security.html#web-ui ,
web UI is covered.
FYI
On Thu, Jan 7, 2016 at 6:35 AM, Kostiantyn Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:
> hi community,
>
> do I understand correctly that spark.ui.filters property sets up filters
> only
did you try -- jars property in spark submit? if your jar is of huge size,
you can pre-load the jar on all executors in a common available directory
to avoid network IO.
On Thu, Jan 7, 2016 at 4:03 PM, Ophir Etzion wrote:
> I' trying to add jars before running a query
First extracting year, month, day, time from the datetime.
Then you should decide which variables can be treated as category features
such as year/month/day and encode them to boolean form using OneHotEncoder.
At last using VectorAssembler to assemble the encoded output vector and the
other raw
you many need to add
createDataFrame( for Python, inferschema) call before registerTempTable.
Thanks,
Prem
On Thu, Jan 7, 2016 at 12:53 PM, Henrik Baastrup <
henrik.baast...@netscout.com> wrote:
> Hi All,
>
> I have a small Hadoop cluster where I have stored a lot of data in parquet
>
are you running standalone - local mode or cluster mode. executor and
driver existance differ based on setup type. snapshot of your env UI would
be helpful to say
On Thu, Jan 7, 2016 at 11:51 AM, wrote:
> Hi,
>
>
>
> After I called rdd.persist(*MEMORY_ONLY_SER*), I
I have a need to route the dstream through the streming pipeline by some key,
such that data with the same key always goes through the same executor.
There doesn't seem to be a way to do manual routing with Spark Streaming. The
closest I can come up with is:
stream.foreachRDD {rdd =>
Thanks for the replay Tathagata. Our pipeline has a rather fat state and that's
why we have custom failure handling that kills all executors and go back to a
certain point in time in the past.
On a separate but related note, I noticed that in a chained map job, the entire
pipeline runs on the
(following up a rather old thread:)
Hi Christopher,
I understand how you might use nearest neighbors for item-item
recommendations, but how do you use it for top N items per user?
Thanks!
Apu
--
View this message in context:
Hi yanbo,
I was able to successfully perform logistic regression on my data and also
performed the cross validation and it all worked fine.
Thanks
Sent from my Sony Xperia™ smartphone
Yanbo Liang wrote
>Hi Chandan,
>
>
>Do you mean to run your own LR algorithm based on SparkR?
>
Thanks Michael for replying, Aggregator/UDAF is exactly what I am looking
for, but are still on 1.4 and it's gonna take time to get 1.6.
On Wed, Jan 6, 2016 at 10:32 AM, Michael Armbrust
wrote:
> In Spark 1.6 GroupedDataset
>
Yes , you can do it unless the method is marked static/final.
Most of the methods in SparkContext are marked static so you can't over
ride them definitely , else over ride would work usually.
Thanks
Deepak
On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman wrote:
>
You can try it.
> 在 2016年1月8日,14:44,yuliya Feldman 写道:
>
> invoked
Hi,
Can anybody please guide me how can we create generate recommendations for
a user using spark?
Regards,
Anjali Gautam
Use spark mlib kmeans algorithm to generate recommendations
On Jan 8, 2016 12:41 PM, anjali gautam wrote:
Hi,
Can anybody please guide me how can we create generate recommendations for
a user using spark?
Regards,
Anjali Gautam
Maybe u want to convert the date to a duration in form of number of hours/days
and then do calculation on it
On Jan 8, 2016 12:39 AM, Jorge Machado wrote:
Hello all,
I'm new to machine learning. I'm trying to predict some electric usage with a
decision Free
The
Hello,
I am new to Spark and have a most likely basic question - can I override a
method from SparkContext?
Thanks
For example to add some functionality there.
I understand I can have extended SparkContext as an implicit class to add new
methods that can be invoked on SparkContext, but I want to see if I can
override existing one.
From: censj
To: yuliya Feldman
Hi kdmxen,You want to delete the broadcast variables on the executors to avoid
executors lost failure, right?Have you try to use the unpersist method? Like
this way:itemSplitBroadcast.destroy(true); =>
itemSplitBroadcast.unpersist(true);
LIN Chen
Date: Thu, 7 Jan 2016 22:01:27 +0800
Subject:
Thank you
From: Deepak Sharma
To: yuliya Feldman
Cc: "user@spark.apache.org"
Sent: Thursday, January 7, 2016 10:41 PM
Subject: Re: Newbie question
Yes , you can do it unless the method is marked
The question itself is very vague.
You might want to use this slide as a starting point
http://www.slideshare.net/CasertaConcepts/analytics-week-recommendations-on-spark.
From: anjali gautam [mailto:anjali.gauta...@gmail.com]
Sent: Friday, January 08, 2016 12:42 PM
To: user@spark.apache.org
Hi all,
I am trying to start solr with a custom plugin which uses spark library. I
am trying to initialize sparkcontext in local mode. I have made a fat jar
for this plugin using maven shade and put it in the lib directory. *While
starting solr it is not able to initialize sparkcontext.* It says
If the method is not final or static then u can
On Jan 8, 2016 12:07 PM, yuliya Feldman wrote:
Hello,
I am new to Spark and have a most likely basic question - can I override a
method from SparkContext?
Thanks
Alternating least squares takes an RDD of (user/product/ratings) tuples
and the resulting Model provides predict(user, product) or predictProducts
methods among others.
Hi, we made the change because the partitioning discovery logic was too
flexible and it introduced problems that were very confusing to users. To
make your case work, we have introduced a new data source option called
basePath. You can use
DataFrame df =
Hi Yin, thanks much your answer solved my problem. Really appreciate it!
Regards
On Fri, Jan 8, 2016 at 1:26 AM, Yin Huai wrote:
> Hi, we made the change because the partitioning discovery logic was too
> flexible and it introduced problems that were very confusing to
No problem! Glad it helped!
On Thu, Jan 7, 2016 at 12:05 PM, Umesh Kacha wrote:
> Hi Yin, thanks much your answer solved my problem. Really appreciate it!
>
> Regards
>
>
> On Fri, Jan 8, 2016 at 1:26 AM, Yin Huai wrote:
>
>> Hi, we made the change
Hello all,
I'm new to machine learning. I'm trying to predict some electric usage with a
decision Free
The data is :
2015-12-10-10:00, 1200
2015-12-11-10:00, 1150
My question is : What is the best way to turn date and time into feature on my
Vector ?
Something like this : Vector (1200,
I tried to build 1.6.0 for yarn and scala 2.11, but have an error. Any help is
appreciated.
[warn] Strategy 'first' was applied to 2 files
[info] Assembly up to date:
/Users/lin/git/spark/network/yarn/target/scala-2.11/spark-network-yarn-1.6.0-hadoop2.7.1.jar
java.lang.IllegalStateException:
Hi All,
I have a small Hadoop cluster where I have stored a lot of data in parquet
files. I have installed a Spark master service on one of the nodes and now
would like to query my parquet files from a Spark client. When I run the
following program from the spark-shell on the Spark Master node
Hi,
I start the cluster using the spark-ec2 scripts, so the cluster is in
stand-alone mode.
Here is how I submit my job:
spark/bin/spark-submit --class demo.spark.StaticDataAnalysis --master
spark://:6066 --deploy-mode cluster demo/Demo-1.0-SNAPSHOT-all.jar
Cheers,
Michael
On 07.01.2016 22:41,
read about *--total-executor-cores*
not sure why you specify port 6066 in master...usually it's 7077
verify in master ui(usually port 8080) how many cores are there(depends on
other configs, but usually workers connect to master with all their cores)
On 7 January 2016 at 23:46, Michael Pisula
can I do it without kerberos and hadoop?
ideally using filters as for job UI
On Jan 7, 2016, at 1:22 PM, Prem Sure wrote:
> you can refer more on https://searchcode.com/codesearch/view/97658783/
>
You cannot guarantee that each key will forever be on the same executor.
That is flawed approach to designing an application if you have to take
ensure fault-tolerance toward executor failures.
On Thu, Jan 7, 2016 at 9:34 AM, Lin Zhao wrote:
> I have a need to route the
Without kerberos you don't have true security.
Cheers
On Thu, Jan 7, 2016 at 1:56 PM, Kostiantyn Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:
> can I do it without kerberos and hadoop?
> ideally using filters as for job UI
>
> On Jan 7, 2016, at 1:22 PM, Prem Sure
Attached is the screen shot of the storage tab details for the cached rdd. The
host highlighted and at the end of the list is driver machine.
[cid:image001.png@01D1496C.FD79AB40]
-regards
Seemanto Barua
From: Barua, Seemanto (US)
Sent: Thursday, January 07, 2016 12:43 PM
To:
I had tried several parameters, including --total-executor-cores, no effect.
As for the port, I tried 7077, but if I remember correctly I got some
kind of error that suggested to try 6066, with which it worked just fine
(apart from this issue here).
Each worker has two cores. I also tried
I know, but I need only to hide/protect web ui at least with servlet/filter api
On Jan 7, 2016, at 4:59 PM, Ted Yu wrote:
> Without kerberos you don't have true security.
>
> Cheers
>
> On Thu, Jan 7, 2016 at 1:56 PM, Kostiantyn Kudriavtsev
>
do you see in master ui that workers connected to master & before you are
running your app there are 2 available cores in master ui per each worker?
I understand that there are 2 cores on each worker - the question is do
they got registered under master
regarding port it's very strange, please
All the workers were connected, I even saw the job being processed on
different workers, so that was working fine.
I will fire up the cluster again tomorrow and post the results of
connecting to 7077 and using --total-executor-cores 4.
Thanks for the help
On 07.01.2016 23:10, Igor Berman wrote:
share how you submit your job
what cluster(yarn, standalone)
On 7 January 2016 at 23:24, Michael Pisula
wrote:
> Hi there,
>
> I ran a simple Batch Application on a Spark Cluster on EC2. Despite having
> 3
> Worker Nodes, I could not get the application processed on
I' trying to add jars before running a query using hive on spark on cdh
5.4.3.
I've tried applying the patch in
https://issues.apache.org/jira/browse/HIVE-12045 (manually as the patch is
done on a different hive version) but still hasn't succeeded.
did anyone manage to do ADD JAR successfully
Hi there,
I ran a simple Batch Application on a Spark Cluster on EC2. Despite having 3
Worker Nodes, I could not get the application processed on more than one
node, regardless if I submitted the Application in Cluster or Client mode.
I also tried manually increasing the number of partitions in
Hello,
When I try to submit a python job using spark-submit (using --master yarn
--deploy-mode cluster), I get the following error:
/Traceback (most recent call last):
File "loss_rate_by_probe.py", line 15, in ?
from pyspark import SparkContext
File
55 matches
Mail list logo