Re: example LDA code ClassCastException

2016-11-04 Thread Tamas Jambor
thanks for the reply. Asher, have you experienced problem when checkpoints are not enabled as well? If we have large number of iterations (over 150) and checkpoints are not enabled, the process just hangs (without no error) at around iteration 120-140 (on spark 2.0.0). I could not reproduce this

Re: store hive metastore on persistent store

2015-05-16 Thread Tamas Jambor
. Checkout this SO thread: http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor jambo...@gmail.com wrote: Gave it another try - it seems that it picks up the variable and prints out the correct value, but still puts

Re: store hive metastore on persistent store

2015-05-16 Thread Tamas Jambor
Gave it another try - it seems that it picks up the variable and prints out the correct value, but still puts the metatore_db folder in the current directory, regardless. On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor jambo...@gmail.com wrote: Thank you for the reply. I have tried your

Re: store hive metastore on persistent store

2015-05-16 Thread Tamas Jambor
, 2015 at 2:03 PM, Tamas Jambor jambo...@gmail.com wrote: thanks for the reply. I am trying to use it without hive setup (spark-standalone), so it prints something like this: hive_ctx.sql(show tables).collect() 15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with implemenation

Re: store hive metastore on persistent store

2015-05-15 Thread Tamas Jambor
in your hive-site.xml file. I have not mucked with warehouse.dir too much but I know that the value of the metastore URI is in fact picked up from there as I regularly point to different systems... On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor jambo...@gmail.com wrote: I have tried to put

Re: store hive metastore on persistent store

2015-05-14 Thread Tamas Jambor
I have tried to put the hive-site.xml file in the conf/ directory with, seems it is not picking up from there. On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust mich...@databricks.com wrote: You can configure Spark SQLs hive interaction by placing a hive-site.xml file in the conf/ directory.

Re: writing to hdfs on master node much faster

2015-04-20 Thread Tamas Jambor
Not sure what would slow it down as the repartition completes equally fast on all nodes, implying that the data is available on all, then there are a few computation steps none of them local on the master. On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen so...@cloudera.com wrote: What machines are

Re: Spark streaming

2015-03-27 Thread Tamas Jambor
It is just a comma separated file, about 10 columns wide which we append with a unique id and a few additional values. On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu yuzhih...@gmail.com wrote: jamborta : Please also describe the format of your csv files. Cheers On Fri, Mar 27, 2015 at 6:42 AM, DW

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Tamas Jambor
create and manage multiple streams - the same way that is possible with batch jobs. On Mon, Mar 2, 2015 at 2:52 PM, Sean Owen so...@cloudera.com wrote: I think everything there is to know about it is on JIRA; I don't think that's being worked on. On Mon, Mar 2, 2015 at 2:50 PM, Tamas Jambor jambo

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Tamas Jambor
I have seen there is a card (SPARK-2243) to enable that. Is that still going ahead? On Mon, Mar 2, 2015 at 2:46 PM, Sean Owen so...@cloudera.com wrote: It is still not something you're supposed to do; in fact there is a setting (disabled by default) that throws an exception if you try to make

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Tamas Jambor
spark-jobserver for that). On Mon, Mar 2, 2015 at 3:07 PM, Sean Owen so...@cloudera.com wrote: You can make a new StreamingContext on an existing SparkContext, I believe? On Mon, Mar 2, 2015 at 3:01 PM, Tamas Jambor jambo...@gmail.com wrote: thanks for the reply. Actually, our main

Re: Interact with streams in a non-blocking way

2015-02-13 Thread Tamas Jambor
Thanks for the reply, I am trying to setup a streaming as a service approach, using the framework that is used for spark-jobserver. for that I would need to handle asynchronous operations that are initiated from outside of the stream. Do you think it is not possible? On Fri Feb 13 2015 at

Re: one is the default value for intercepts in GeneralizedLinearAlgorithm

2015-02-06 Thread Tamas Jambor
Thanks for the reply. Seems it is all set to zero in the latest code - I was checking 1.2 last night. On Fri Feb 06 2015 at 07:21:35 Sean Owen so...@cloudera.com wrote: It looks like the initial intercept term is 1 only in the addIntercept numOfLinearPredictor == 1 case. It does seem

Re: spark context not picking up default hadoop filesystem

2015-01-26 Thread Tamas Jambor
thanks for the reply. I have tried to add SPARK_CLASSPATH, I got a warning that it was deprecated (didn't solve the problem), also tried to run with --driver-class-path, which did not work either. I am trying this locally. On Mon Jan 26 2015 at 15:04:03 Akhil Das ak...@sigmoidanalytics.com

Re: dynamically change receiver for a spark stream

2015-01-21 Thread Tamas Jambor
thanks for the replies. is this something we can get around? Tried to hack into the code without much success. On Wed, Jan 21, 2015 at 3:15 AM, Shao, Saisai saisai.s...@intel.com wrote: Hi, I don't think current Spark Streaming support this feature, all the DStream lineage is fixed after

Re: dynamically change receiver for a spark stream

2015-01-21 Thread Tamas Jambor
to manage job lifecycle. You will still need to solve the dynamic configuration through some alternative channel. On Wed, Jan 21, 2015 at 11:30 AM, Tamas Jambor jambo...@gmail.com wrote: thanks for the replies. is this something we can get around? Tried to hack into the code without much success

Re: dynamically change receiver for a spark stream

2015-01-21 Thread Tamas Jambor
of passing a configuration. -kr, Gerard. On Wed, Jan 21, 2015 at 11:54 AM, Tamas Jambor jambo...@gmail.com wrote: we were thinking along the same line, that is to fix the number of streams and change the input and output channels dynamically. But could not make it work (seems

Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Tamas Jambor
Thanks. The problem is that we'd like it to be picked up by hive. On Tue Jan 13 2015 at 18:15:15 Davies Liu dav...@databricks.com wrote: On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote: Hi all, Is there a way to save dstream RDDs to a single file so that another

Re: No module named pyspark - latest built

2014-11-12 Thread Tamas Jambor
Thanks. Will it work with sbt at some point? On Thu, 13 Nov 2014 01:03 Xiangrui Meng men...@gmail.com wrote: You need to use maven to include python files. See https://github.com/apache/spark/pull/1223 . -Xiangrui On Wed, Nov 12, 2014 at 4:48 PM, jamborta jambo...@gmail.com wrote: I have

Re: why decision trees do binary split?

2014-11-06 Thread Tamas Jambor
Thanks for the reply, Sean. I can see that splitting on all the categories would probably overfit the tree, on the other hand, it might give more insight on the subcategories (probably only would work if the data is uniformly distributed between the categories). I haven't really found any

Re: pass unique ID to mllib algorithms pyspark

2014-11-05 Thread Tamas Jambor
Hi Xiangrui, Thanks for the reply. is this still due to be released in 1.2 (SPARK-3530 is still open)? Thanks, On Wed, Nov 5, 2014 at 3:21 AM, Xiangrui Meng men...@gmail.com wrote: The proposed new set of APIs (SPARK-3573, SPARK-3530) will address this issue. We carry over extra columns with

Re: partition size for initial read

2014-10-02 Thread Tamas Jambor
That would work - I normally use hive queries through spark sql, I have not seen something like that there. On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain ashish@gmail.com wrote: If you are using textFiles() to read data in, it also takes in a parameter the number of minimum partitions to

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Tamas Jambor
thanks Marcelo. What's the reason it is not possible in cluster mode, either? On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com wrote: You can't set up the driver memory programatically in client mode. In that mode, the same JVM is running the driver, so you can't modify

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Tamas Jambor
, Tamas Jambor jambo...@gmail.com wrote: thanks Marcelo. What's the reason it is not possible in cluster mode, either? On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com wrote: You can't set up the driver memory programatically in client mode. In that mode, the same JVM

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Tamas Jambor
the --driver-memory command line option if you are using Spark submit (bin/pyspark goes through this path, as you have discovered on your own). Does that make sense? 2014-10-01 10:17 GMT-07:00 Tamas Jambor jambo...@gmail.com: when you say respective backend code to launch it, I thought

Re: yarn does not accept job in cluster mode

2014-09-29 Thread Tamas Jambor
thanks for the reply. As I mentioned above, all works in yarn-client mode, the problem starts when I try to run it in yarn-cluster mode. (seems that spark-shell does not work in yarn-cluster mode, so cannot debug that way). On Mon, Sep 29, 2014 at 7:30 AM, Akhil Das ak...@sigmoidanalytics.com

Re: Yarn number of containers

2014-09-25 Thread Tamas Jambor
Thank you. Where is the number of containers set? On Thu, Sep 25, 2014 at 7:17 PM, Marcelo Vanzin van...@cloudera.com wrote: On Thu, Sep 25, 2014 at 8:55 AM, jamborta jambo...@gmail.com wrote: I am running spark with the default settings in yarn client mode. For some reason yarn always

Re: access javaobject in rdd map

2014-09-23 Thread Tamas Jambor
Hi Davies, Thanks for the reply. I saw that you guys do that way in the code. Is there no other way? I have implemented all the predict functions in scala, so I prefer not to reimplement the whole thing in python. thanks, On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu dav...@databricks.com