thanks for the reply.
Asher, have you experienced problem when checkpoints are not enabled as
well? If we have large number of iterations (over 150) and checkpoints are
not enabled, the process just hangs (without no error) at around iteration
120-140 (on spark 2.0.0). I could not reproduce this
. Checkout this SO
thread:
http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive
On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor jambo...@gmail.com wrote:
Gave it another try - it seems that it picks up the variable and prints
out the correct value, but still puts
Gave it another try - it seems that it picks up the variable and prints out
the correct value, but still puts the metatore_db folder in the current
directory, regardless.
On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor jambo...@gmail.com wrote:
Thank you for the reply.
I have tried your
, 2015 at 2:03 PM, Tamas Jambor jambo...@gmail.com wrote:
thanks for the reply. I am trying to use it without hive setup
(spark-standalone), so it prints something like this:
hive_ctx.sql(show tables).collect()
15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with
implemenation
in your
hive-site.xml file. I have not mucked with warehouse.dir too much but I
know that the value of the metastore URI is in fact picked up from there as
I regularly point to different systems...
On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor jambo...@gmail.com wrote:
I have tried to put
I have tried to put the hive-site.xml file in the conf/ directory with,
seems it is not picking up from there.
On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust mich...@databricks.com
wrote:
You can configure Spark SQLs hive interaction by placing a hive-site.xml
file in the conf/ directory.
Not sure what would slow it down as the repartition completes equally fast
on all nodes, implying that the data is available on all, then there are a
few computation steps none of them local on the master.
On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen so...@cloudera.com wrote:
What machines are
It is just a comma separated file, about 10 columns wide which we append
with a unique id and a few additional values.
On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu yuzhih...@gmail.com wrote:
jamborta :
Please also describe the format of your csv files.
Cheers
On Fri, Mar 27, 2015 at 6:42 AM, DW
create and manage
multiple streams - the same way that is possible with batch jobs.
On Mon, Mar 2, 2015 at 2:52 PM, Sean Owen so...@cloudera.com wrote:
I think everything there is to know about it is on JIRA; I don't think
that's being worked on.
On Mon, Mar 2, 2015 at 2:50 PM, Tamas Jambor jambo
I have seen there is a card (SPARK-2243) to enable that. Is that still
going ahead?
On Mon, Mar 2, 2015 at 2:46 PM, Sean Owen so...@cloudera.com wrote:
It is still not something you're supposed to do; in fact there is a
setting (disabled by default) that throws an exception if you try to
make
spark-jobserver for that).
On Mon, Mar 2, 2015 at 3:07 PM, Sean Owen so...@cloudera.com wrote:
You can make a new StreamingContext on an existing SparkContext, I believe?
On Mon, Mar 2, 2015 at 3:01 PM, Tamas Jambor jambo...@gmail.com wrote:
thanks for the reply.
Actually, our main
Thanks for the reply, I am trying to setup a streaming as a service
approach, using the framework that is used for spark-jobserver. for that I
would need to handle asynchronous operations that are initiated from
outside of the stream. Do you think it is not possible?
On Fri Feb 13 2015 at
Thanks for the reply. Seems it is all set to zero in the latest code - I
was checking 1.2 last night.
On Fri Feb 06 2015 at 07:21:35 Sean Owen so...@cloudera.com wrote:
It looks like the initial intercept term is 1 only in the addIntercept
numOfLinearPredictor == 1 case. It does seem
thanks for the reply. I have tried to add SPARK_CLASSPATH, I got a warning
that it was deprecated (didn't solve the problem), also tried to run with
--driver-class-path, which did not work either. I am trying this locally.
On Mon Jan 26 2015 at 15:04:03 Akhil Das ak...@sigmoidanalytics.com
thanks for the replies.
is this something we can get around? Tried to hack into the code without
much success.
On Wed, Jan 21, 2015 at 3:15 AM, Shao, Saisai saisai.s...@intel.com wrote:
Hi,
I don't think current Spark Streaming support this feature, all the
DStream lineage is fixed after
to manage job
lifecycle. You will still need to solve the dynamic configuration through
some alternative channel.
On Wed, Jan 21, 2015 at 11:30 AM, Tamas Jambor jambo...@gmail.com wrote:
thanks for the replies.
is this something we can get around? Tried to hack into the code without
much success
of passing a
configuration.
-kr, Gerard.
On Wed, Jan 21, 2015 at 11:54 AM, Tamas Jambor jambo...@gmail.com wrote:
we were thinking along the same line, that is to fix the number of
streams and change the input and output channels dynamically.
But could not make it work (seems
Thanks. The problem is that we'd like it to be picked up by hive.
On Tue Jan 13 2015 at 18:15:15 Davies Liu dav...@databricks.com wrote:
On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote:
Hi all,
Is there a way to save dstream RDDs to a single file so that another
Thanks. Will it work with sbt at some point?
On Thu, 13 Nov 2014 01:03 Xiangrui Meng men...@gmail.com wrote:
You need to use maven to include python files. See
https://github.com/apache/spark/pull/1223 . -Xiangrui
On Wed, Nov 12, 2014 at 4:48 PM, jamborta jambo...@gmail.com wrote:
I have
Thanks for the reply, Sean.
I can see that splitting on all the categories would probably overfit
the tree, on the other hand, it might give more insight on the
subcategories (probably only would work if the data is uniformly
distributed between the categories).
I haven't really found any
Hi Xiangrui,
Thanks for the reply. is this still due to be released in 1.2
(SPARK-3530 is still open)?
Thanks,
On Wed, Nov 5, 2014 at 3:21 AM, Xiangrui Meng men...@gmail.com wrote:
The proposed new set of APIs (SPARK-3573, SPARK-3530) will address
this issue. We carry over extra columns with
That would work - I normally use hive queries through spark sql, I
have not seen something like that there.
On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain ashish@gmail.com wrote:
If you are using textFiles() to read data in, it also takes in a parameter
the number of minimum partitions to
thanks Marcelo.
What's the reason it is not possible in cluster mode, either?
On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com wrote:
You can't set up the driver memory programatically in client mode. In
that mode, the same JVM is running the driver, so you can't modify
, Tamas Jambor jambo...@gmail.com wrote:
thanks Marcelo.
What's the reason it is not possible in cluster mode, either?
On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com wrote:
You can't set up the driver memory programatically in client mode. In
that mode, the same JVM
the --driver-memory command line option if
you are using Spark submit (bin/pyspark goes through this path, as you have
discovered on your own).
Does that make sense?
2014-10-01 10:17 GMT-07:00 Tamas Jambor jambo...@gmail.com:
when you say respective backend code to launch it, I thought
thanks for the reply.
As I mentioned above, all works in yarn-client mode, the problem
starts when I try to run it in yarn-cluster mode.
(seems that spark-shell does not work in yarn-cluster mode, so cannot
debug that way).
On Mon, Sep 29, 2014 at 7:30 AM, Akhil Das ak...@sigmoidanalytics.com
Thank you.
Where is the number of containers set?
On Thu, Sep 25, 2014 at 7:17 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Thu, Sep 25, 2014 at 8:55 AM, jamborta jambo...@gmail.com wrote:
I am running spark with the default settings in yarn client mode. For some
reason yarn always
Hi Davies,
Thanks for the reply. I saw that you guys do that way in the code. Is
there no other way?
I have implemented all the predict functions in scala, so I prefer not
to reimplement the whole thing in python.
thanks,
On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu dav...@databricks.com
28 matches
Mail list logo