[jira] [Updated] (SPARK-18621) PySQL SQL Types (aka Dataframa Schema) have __repr__() with Scala and not Python representation

2016-11-29 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romi Kuntsman updated SPARK-18621: -- Description: When using Python's repr() on an object, the expected result is a string

[jira] [Created] (SPARK-18621) PySQL SQL Types (aka Dataframa Schema) have __repr__() with Scala and not Python representation

2016-11-29 Thread Romi Kuntsman (JIRA)
Romi Kuntsman created SPARK-18621: - Summary: PySQL SQL Types (aka Dataframa Schema) have __repr__() with Scala and not Python representation Key: SPARK-18621 URL: https://issues.apache.org/jira/browse/SPARK-18621

Israel Spark Meetup

2016-09-20 Thread Romi Kuntsman
Hello, Please add a link in Spark Community page ( https://spark.apache.org/community.html) To Israel Spark Meetup (https://www.meetup.com/israel-spark-users/) We're an active meetup group, unifying the local Spark user community, and having regular meetups. Thanks! Romi K.

Re: SparkSession replace SQLContext

2016-07-05 Thread Romi Kuntsman
You can also claim that there's a whole section of "Migrating from 1.6 to 2.0" missing there: https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#migration-guide *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Jul 5, 2016 at 12:24 PM, nihed mb

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2016-04-24 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255658#comment-15255658 ] Romi Kuntsman commented on SPARK-4452: -- Hi, what's the reason this will only be available in Spark

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Romi Kuntsman
+1 for Java 8 only I think it will make it easier to make a unified API for Java and Scala, instead of the wrappers of Java over Scala. On Mar 24, 2016 11:46 AM, "Stephen Boesch" wrote: > +1 for java8 only +1 for 2.11+ only .At this point scala libraries > supporting

Re: Spark 1.6.1

2016-02-22 Thread Romi Kuntsman
Sounds fair. Is it to avoid cluttering maven central with too many intermediate versions? What do I need to add in my pom.xml section to make it work? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Feb 23, 2016 at 9:34 AM, Reynold Xin <r...@databricks.com> wrote:

Re: Spark 1.6.1

2016-02-22 Thread Romi Kuntsman
Is it possible to make RC versions available via Maven? (many projects do that) That will make integration much easier, so many more people can test the version before the final release. Thanks! *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Feb 23, 2016 at 8:07 AM, Luciano

Re: Spark 1.6.1

2016-02-02 Thread Romi Kuntsman
Hi Michael, What about the memory leak bug? https://issues.apache.org/jira/browse/SPARK-11293 Even after the memory rewrite in 1.6.0, it still happens in some cases. Will it be fixed for 1.6.1? Thanks, *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Feb 1, 2016 at 9:59 PM

[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

2016-01-14 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098008#comment-15098008 ] Romi Kuntsman commented on SPARK-11293: --- so add 1.6.0 as affected version... > Spilla

[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

2016-01-13 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096375#comment-15096375 ] Romi Kuntsman commented on SPARK-11293: --- so should be reopened or not? is there still a memory leak

[jira] [Commented] (SPARK-3665) Java API for GraphX

2016-01-06 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085452#comment-15085452 ] Romi Kuntsman commented on SPARK-3665: -- So at what version of Spark is it expected to happen? > J

[jira] [Issue Comment Deleted] (SPARK-3665) Java API for GraphX

2016-01-06 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romi Kuntsman updated SPARK-3665: - Comment: was deleted (was: So at what version of Spark is it expected to happen?) > Java

[jira] [Commented] (SPARK-3665) Java API for GraphX

2016-01-06 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085454#comment-15085454 ] Romi Kuntsman commented on SPARK-3665: -- So at what version of Spark is it expected to happen

Re: Shuffle FileNotFound Exception

2015-11-18 Thread Romi Kuntsman
take executor memory times spark.shuffle.memoryFraction and divide the data so that each partition is less than the above *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Nov 18, 2015 at 2:09 PM, Tom Arnfeld <t...@duedil.com> wrote: > Hi Romi, > > Thanks! Co

[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

2015-11-17 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008738#comment-15008738 ] Romi Kuntsman commented on SPARK-11293: --- The memory manager was rewritten there? Could it have

[jira] [Commented] (SPARK-6962) Netty BlockTransferService hangs in the middle of SQL query

2015-11-15 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005851#comment-15005851 ] Romi Kuntsman commented on SPARK-6962: -- what's the status of this? something similar happens to me

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
If they have a problem managing memory, wouldn't there should be a OOM? Why does AppClient throw a NPE? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 9, 2015 at 4:59 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Is that all you have in the executo

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
timeout etc) *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 9, 2015 at 6:00 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Did you find anything regarding the OOM in the executor logs? > > Thanks > Best Regards > > On Mon, Nov 9, 2015 at 8

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
If they have a problem managing memory, wouldn't there should be a OOM? Why does AppClient throw a NPE? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 9, 2015 at 4:59 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Is that all you have in the executo

[jira] [Commented] (SPARK-3767) Support wildcard in Spark properties

2015-11-09 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996389#comment-14996389 ] Romi Kuntsman commented on SPARK-3767: -- [~andrewor14] what's going on with this issue? I found

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
be in Spark 2.0) *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Fri, Nov 6, 2015 at 2:53 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Sean, > > Happy to see this discussion. > > I'm working on PoC to run Camel on Spark Streaming. The purpose is t

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
ultiple levels of aggregations, iterative machine learning algorithms etc. Sending the whole "workplan" to the Spark framework would be, as I see it, the next step of it's evolution, like stored procedures send a logic with many SQL queries to the database. Was it more clear this time?

Re: Ready to talk about Spark 2.0?

2015-11-08 Thread Romi Kuntsman
different, and building the framework around that will benefit each of those flows (like events instead of microbatches in streaming, worker-side intermediate processing in batch, etc). So where is the best way to have a full Spark 2.0 discussion? *Romi Kuntsman*, *Big Data Engineer* http://www.t

Re: JMX with Spark

2015-11-05 Thread Romi Kuntsman
Have you read this? https://spark.apache.org/docs/latest/monitoring.html *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Nov 5, 2015 at 2:08 PM, Yogesh Vyas <informy...@gmail.com> wrote: > Hi, > How we can use JMX and JConsole to monitor our Spark

Re: DataFrame.toJavaRDD cause fetching data to driver, is it expected ?

2015-11-04 Thread Romi Kuntsman
I noticed that toJavaRDD causes a computation on the DataFrame, so is it considered an action, even though logically it's a transformation? On Nov 4, 2015 6:51 PM, "Aliaksei Tsyvunchyk" wrote: > Hello folks, > > Recently I have noticed unexpectedly big network traffic

Re: DataFrame.toJavaRDD cause fetching data to driver, is it expected ?

2015-11-04 Thread Romi Kuntsman
; perform make/reduce on dataFrame without causing it to load all data to > driver program ? > > On Nov 4, 2015, at 12:34 PM, Romi Kuntsman <r...@totango.com> wrote: > > I noticed that toJavaRDD causes a computation on the DataFrame, so is it > considered an action, even tho

Re: Getting Started

2015-11-02 Thread Romi Kuntsman
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Fri, Oct 30, 2015 at 1:25 PM, Saurabh Shah <shahsaurabh0...@gmail.com> wrote: > Hello, my name is Saurabh Shah and I am a second year undergraduate

Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Romi Kuntsman
except "spark.master", do you have "spark://" anywhere in your code or config files? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 2, 2015 at 11:27 AM, Balachandar R.A. <balachandar...@gmail.com> wrote: > > -- Forwarded message

Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-01 Thread Romi Kuntsman
103) at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1501) at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2005) at org.apache.spark.SparkContext.(SparkContext.scala:543) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) Thanks! *Romi Kuntsman*, *Big D

Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-01 Thread Romi Kuntsman
103) at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1501) at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2005) at org.apache.spark.SparkContext.(SparkContext.scala:543) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) Thanks! *Romi Kuntsman*, *Big D

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Romi Kuntsman
Did you try to cache a DataFrame with just a single row? Do you rows have any columns with null values? Can you post a code snippet here on how you load/generate the dataframe? Does dataframe.rdd.cache work? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Oct 29, 2015 at 4:33

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Romi Kuntsman
thrown from PixelObject? Are you running spark with master=local, so it's running inside your IDE and you can see the errors from the driver and worker? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Oct 29, 2015 at 10:04 AM, Zhang, Jingyu <jingyu.zh...@news.com.au> wro

[jira] [Commented] (SPARK-11229) NPE in JoinedRow.isNullAt when spark.shuffle.memoryFraction=0

2015-10-22 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968662#comment-14968662 ] Romi Kuntsman commented on SPARK-11229: --- [~marmbrus] it's reproducible in 1.5.1 as [~xwu0226

[jira] [Commented] (SPARK-7335) Submitting a query to Thrift Server occurs error: java.lang.IllegalStateException: unread block data

2015-10-21 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966490#comment-14966490 ] Romi Kuntsman commented on SPARK-7335: -- [~meiyoula] can you please reopen the issue? I got

[jira] [Created] (SPARK-11229) NPE in JoinedRow.isNullAt when spark.shuffle.memoryFraction=0

2015-10-21 Thread Romi Kuntsman (JIRA)
Romi Kuntsman created SPARK-11229: - Summary: NPE in JoinedRow.isNullAt when spark.shuffle.memoryFraction=0 Key: SPARK-11229 URL: https://issues.apache.org/jira/browse/SPARK-11229 Project: Spark

[jira] [Commented] (SPARK-11153) Turns off Parquet filter push-down for string and binary columns

2015-10-21 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966306#comment-14966306 ] Romi Kuntsman commented on SPARK-11153: --- Does this mean that all Spark 1.5.1 are recommended to set

[jira] [Created] (SPARK-11228) Job stuck in Executor failure loop when NettyTransport failed to bind

2015-10-21 Thread Romi Kuntsman (JIRA)
Romi Kuntsman created SPARK-11228: - Summary: Job stuck in Executor failure loop when NettyTransport failed to bind Key: SPARK-11228 URL: https://issues.apache.org/jira/browse/SPARK-11228 Project

[jira] [Commented] (SPARK-2563) Re-open sockets to handle connect timeouts

2015-10-13 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955264#comment-14955264 ] Romi Kuntsman commented on SPARK-2563: -- i got a socket timeout in spark 1.4.0 is this still relevant

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Romi Kuntsman
RDD is a set of data rows (in your case numbers), there is no meaning for the order of the items. What exactly are you trying to accomplish? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote:

Re: passing SparkContext as parameter

2015-09-21 Thread Romi Kuntsman
sparkConext is available on the driver, not on executors. To read from Cassandra, you can use something like this: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Sep 21, 2015 at 2:27 PM

Re: passing SparkContext as parameter

2015-09-21 Thread Romi Kuntsman
sparkConext is available on the driver, not on executors. To read from Cassandra, you can use something like this: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Sep 21, 2015 at 2:27 PM

Re: passing SparkContext as parameter

2015-09-21 Thread Romi Kuntsman
foreach is something that runs on the driver, not the workers. if you want to perform some function on each record from cassandra, you need to do cassandraRdd.map(func), which will run distributed on the spark workers *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Sep 21

Re: how to send additional configuration to the RDD after it was lazily created

2015-09-21 Thread Romi Kuntsman
again. *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Sep 17, 2015 at 10:07 AM, Gil Vernik <g...@il.ibm.com> wrote: > Hi, > > I have the following case, which i am not sure how to resolve. > > My code uses HadoopRDD and creates various RDDs on top of i

Re: how to get RDD from two different RDDs with cross column

2015-09-21 Thread Romi Kuntsman
Hi, If I understand correctly: rdd1 contains keys (of type StringDate) rdd2 contains keys and values and rdd3 contains all the keys, and the values from rdd2? I think you should make rdd1 and rdd2 PairRDD, and then use outer join. Does that make sense? On Mon, Sep 21, 2015 at 8:37 PM Zhiliang

Re: passing SparkContext as parameter

2015-09-21 Thread Romi Kuntsman
on, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <r...@totango.com> wrote: > >> foreach is something that runs on the driver, not the workers. >> >> if you want to perform some function on each record from cassandra, you >> need to do cassandraRdd.map(func), which will run di

[jira] [Commented] (SPARK-5421) SparkSql throw OOM at shuffle

2015-09-08 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734561#comment-14734561 ] Romi Kuntsman commented on SPARK-5421: -- does this still happen on the latest version? I got some OOM

How to determine the value for spark.sql.shuffle.partitions?

2015-09-01 Thread Romi Kuntsman
Hi all, The number of partition greatly affect the speed and efficiency of calculation, in my case in DataFrames/SparkSQL on Spark 1.4.0. Too few partitions with large data cause OOM exceptions. Too many partitions on small data cause a delay due to overhead. How do you programmatically

Re: How to remove worker node but let it finish first?

2015-08-29 Thread Romi Kuntsman
://mesos.apache.org/documentation/latest/app-framework-development-guide/ Thanks Best Regards On Mon, Aug 24, 2015 at 12:11 PM, Romi Kuntsman r...@totango.com wrote: Hi, I have a spark standalone cluster with 100s of applications per day, and it changes size (more or less workers) at various hours

Re: Exception when S3 path contains colons

2015-08-25 Thread Romi Kuntsman
Hello, We had the same problem. I've written a blog post with the detailed explanation and workaround: http://labs.totango.com/spark-read-file-with-colon/ Greetings, Romi K. On Tue, Aug 25, 2015 at 2:47 PM Gourav Sengupta gourav.sengu...@gmail.com wrote: I am not quite sure about this but

How to remove worker node but let it finish first?

2015-08-24 Thread Romi Kuntsman
Hi, I have a spark standalone cluster with 100s of applications per day, and it changes size (more or less workers) at various hours. The driver runs on a separate machine outside the spark cluster. When a job is running and it's worker is killed (because at that hour the number of workers is

Re: How to overwrite partition when writing Parquet?

2015-08-20 Thread Romi Kuntsman
structure, check this out... http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman r...@totango.com wrote: Hello, I have a DataFrame, with a date column which I want to use as a partition. Each day I want to write

[jira] [Created] (SPARK-10135) Percent of pruned partitions is shown wrong

2015-08-20 Thread Romi Kuntsman (JIRA)
Romi Kuntsman created SPARK-10135: - Summary: Percent of pruned partitions is shown wrong Key: SPARK-10135 URL: https://issues.apache.org/jira/browse/SPARK-10135 Project: Spark Issue Type

How to overwrite partition when writing Parquet?

2015-08-19 Thread Romi Kuntsman
Hello, I have a DataFrame, with a date column which I want to use as a partition. Each day I want to write the data for the same date in Parquet, and then read a dataframe for a date range. I'm using: myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir); If I use

Re: How to minimize shuffling on Spark dataframe Join?

2015-08-19 Thread Romi Kuntsman
If you create a PairRDD from the DataFrame, using dataFrame.toRDD().mapToPair(), then you can call partitionBy(someCustomPartitioner) which will partition the RDD by the key (of the pair). Then the operations on it (like joining with another RDD) will consider this partitioning. I'm not sure that

Re: Issues with S3 paths that contain colons

2015-08-19 Thread Romi Kuntsman
I had the exact same issue, and overcame it by overriding NativeS3FileSystem with my own class, where I replaced the implementation of globStatus. It's a hack but it works. Then I set the hadoop config fs.myschema.impl to my class name, and accessed the files through myschema:// instead of s3n://

Re: spark as a lookup engine for dedup

2015-07-27 Thread Romi Kuntsman
is spark RDD not fit for this requirement? On Mon, Jul 27, 2015 at 1:08 PM, Romi Kuntsman r...@totango.com wrote: What the throughput of processing and for how long do you need to remember duplicates? You can take all the events, put them in an RDD, group by the key, and then process each key

Re: spark as a lookup engine for dedup

2015-07-27 Thread Romi Kuntsman
What the throughput of processing and for how long do you need to remember duplicates? You can take all the events, put them in an RDD, group by the key, and then process each key only once. But if you have a long running application where you want to check that you didn't see the same value

Re: Scaling spark cluster for a running application

2015-07-22 Thread Romi Kuntsman
Are you running the Spark cluster in standalone or YARN? In standalone, the application gets the available resources when it starts. With YARN, you can try to turn on the setting *spark.dynamicAllocation.enabled* See https://spark.apache.org/docs/latest/configuration.html On Wed, Jul 22, 2015 at

Applications metrics unseparatable from Master metrics?

2015-07-22 Thread Romi Kuntsman
Hi, I tried to enable Master metrics source (to get number of running/waiting applications etc), and connected it to Graphite. However, when these are enabled, application metrics are also sent. Is it possible to separate them, and send only master metrics without applications? I see that

Re: Timestamp functions for sqlContext

2015-07-21 Thread Romi Kuntsman
Hi Tal, I'm not sure there is currently a built-in function for it, but you can easily define a UDF (user defined function) by extending org.apache.spark.sql.api.java.UDF1, registering it (sparkContext.udf().register(...)), and then use it inside your query. RK. On Tue, Jul 21, 2015 at 7:04

Spark Application stuck retrying task failed on Java heap space?

2015-07-21 Thread Romi Kuntsman
Hello, *TL;DR: task crashes with OOM, but application gets stuck in infinite loop retrying the task over and over again instead of failing fast.* Using Spark 1.4.0, standalone, with DataFrames on Java 7. I have an application that does some aggregations. I played around with shuffling settings,

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Romi Kuntsman
by retrying). *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Apr 1, 2015 at 12:58 PM, Gil Vernik g...@il.ibm.com wrote: I actually saw the same issue, where we analyzed some container with few hundreds of GBs zip files - one was corrupted and Spark exit with Exception

[jira] [Commented] (SPARK-2579) Reading from S3 returns an inconsistent number of items with Spark 0.9.1

2015-02-17 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324285#comment-14324285 ] Romi Kuntsman commented on SPARK-2579: -- Does this still happen with Spark 1.2.1

[jira] [Commented] (SPARK-4879) Missing output partitions after job completes with speculative execution

2015-02-17 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324278#comment-14324278 ] Romi Kuntsman commented on SPARK-4879: -- Could this happen very very rarely when

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Jan 7, 2015 at 7:59

Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-19 Thread Romi Kuntsman
to you *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Jan 15, 2015 at 1:52 PM, preeze etan...@gmail.com wrote: From the official spark documentation (http://spark.apache.org/docs/1.2.0/running-on-yarn.html): In yarn-cluster mode, the Spark driver runs inside an application

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
Actually there is already someone on Hadoop-Common-Dev taking care of removing the old Guava dependency http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser https://issues.apache.org/jira/browse/HADOOP-11470 *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com

Re: Announcing Spark 1.1.1!

2014-12-03 Thread Romi Kuntsman
About version compatibility and upgrade path - can the Java application dependencies and the Spark server be upgraded separately (i.e. will 1.1.0 library work with 1.1.1 server, and vice versa), or do they need to be upgraded together? Thanks! *Romi Kuntsman*, *Big Data Engineer* http

ExternalAppendOnlyMap: Thread spilling in-memory map of to disk many times slowly

2014-11-26 Thread Romi Kuntsman
of 12 MB to disk (36 times so far) 14/11/24 13:13:45 INFO ExternalAppendOnlyMap: Thread 64 spilling in-memory map of 11 MB to disk (37 times so far) 14/11/24 13:13:56 INFO FileOutputCommitter: Saved output of task 'attempt_201411241250__m_00_90' to s3n://mybucket/mydir/output *Romi Kuntsman

ExternalAppendOnlyMap: Thread spilling in-memory map of to disk many times slowly

2014-11-24 Thread Romi Kuntsman
of 12 MB to disk (36 times so far) 14/11/24 13:13:45 INFO ExternalAppendOnlyMap: Thread 64 spilling in-memory map of 11 MB to disk (37 times so far) 14/11/24 13:13:56 INFO FileOutputCommitter: Saved output of task 'attempt_201411241250__m_00_90' to s3n://mybucket/mydir/output *Romi Kuntsman

[jira] [Commented] (SPARK-2867) saveAsHadoopFile() in PairRDDFunction.scala should allow use other OutputCommiter class

2014-11-12 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207997#comment-14207997 ] Romi Kuntsman commented on SPARK-2867: -- In the latest code, it seems to be resolved

Re: Spark job resource allocation best practices

2014-11-04 Thread Romi Kuntsman
How can I configure Mesos allocation policy to share resources between all current Spark applications? I can't seem to find it in the architecture docs. *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Nov 4, 2014 at 9:11 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Yes

Re: Spark job resource allocation best practices

2014-11-04 Thread Romi Kuntsman
I have a single Spark cluster, not multiple frameworks and not multiple versions. Is it relevant for my use-case? Where can I find information about exactly how to make Mesos tell Spark how many resources of the cluster to use? (instead of the default take-all) *Romi Kuntsman*, *Big Data Engineer

Re: Spark job resource allocation best practices

2014-11-04 Thread Romi Kuntsman
Let's say that I run Spark on Mesos in fine-grained mode, and I have 12 cores and 64GB memory. I run application A on Spark, and some time after that (but before A finished) application B. How many CPUs will each of them get? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue

Spark job resource allocation best practices

2014-11-03 Thread Romi Kuntsman
together, and together lets them use all the available resources? - How do you divide resources between applications on your usecase? P.S. I started reading about Mesos but couldn't figure out if/how it could solve the described issue. Thanks! *Romi Kuntsman*, *Big Data Engineer* http

Re: Spark job resource allocation best practices

2014-11-03 Thread Romi Kuntsman
the resources between them, according to how many are trying to run at the same time? So for example if I have 12 cores - if one job is scheduled, it will get 12 cores, but if 3 are scheduled, then each one will get 4 cores and then will all start. Thanks! *Romi Kuntsman*, *Big Data Engineer* http

Re: Dynamically switching Nr of allocated core

2014-11-03 Thread Romi Kuntsman
just 2 cores (as you said it will get even when there are 12 available), but gets nothing 4 - Until I stop app B, app A is stuck waiting, instead of app B freeing 2 cores and dropping to 10 cores. *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 3, 2014 at 3:17 PM, RodrigoB

Re: Workers disconnected from master sometimes and never reconnect back

2014-09-29 Thread Romi Kuntsman
to the master. Using Spark 1.1.0. What if a master server is restarted, should worker retry to register on it? Greetings, -- *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com ​Join the Customer Success Manifesto http://youtu.be/XvFi2Wh6wgU

Re: [Swftools-common] Access Violation in swf_GetU8

2012-10-14 Thread Romi Kuntsman
=gmail@nongnu.org [mailto:swftools-common-bounces+imuserpol+swftools=gmail@nongnu.org] On Behalf Of lists Sent: Tuesday, October 02, 2012 11:32 AM To: swftools-common@nongnu.org Cc: Romi Kuntsman Subject: Re: [Swftools-common] Access Violation in swf_GetU8 On Tue, 2 Oct 2012 15:23:04

Re: [Swftools-common] Access Violation in swf_GetU8

2012-09-30 Thread Romi Kuntsman
+0300, Romi Kuntsman rmk...@gmail.com wrote: U8 swf_GetU8(TAG * t) { swf_ResetReadBits(t); #ifdef DEBUG_RFXSWF if ((int)t-pos=(int)t-len) { fprintf(stderr,GetU8() out of bounds: TagID = %i\n,t-id); *(int*)0=0; return 0; } #endif return t-data[t-pos

Re: [Swftools-common] Access Violation in swf_GetU8

2012-09-02 Thread Romi Kuntsman
*)0 = 0xdead; swftools-2012-04-08-0857\lib\rfxswf.c (1 hits) Line 97: *(int*)0=0; On Sun, Sep 2, 2012 at 12:47 PM, Romi Kuntsman rmk...@gmail.com wrote: Hi, This code CRASHES the program: **(int*)0=0;* U8 swf_GetU8(TAG * t) { swf_ResetReadBits(t); #ifdef DEBUG_RFXSWF if ((int)t

Re: [Swftools-common] Passing swf in stdin/pipe to swfdump

2012-08-26 Thread Romi Kuntsman
Isn't it possible to read from stdin into a buffer in memory, then determine it's size, and then go over the data in memory? On Sun, Aug 26, 2012 at 3:01 AM, Matthias Kramm kr...@quiss.org wrote: On Tue, Aug 07, 2012 at 03:30:20PM +0300, Romi Kuntsman rmk...@gmail.com wrote: I'm handling

[Swftools-common] Passing swf in stdin/pipe to swfdump

2012-08-07 Thread Romi Kuntsman
Hello, I'm handling a SWF file in memory in my program, and would like to pass the file to swfdump and read the output. How can this be done without writing it to a temporary file on disk and then passing the filename as parameter, for example using a pipe or similar option? If the code

Re: [Swftools-common] clickable swf from gif/png/jpg

2008-08-13 Thread Romi Kuntsman
Also, embedding this swf from gif using swfc *doesn't* make it clickable. 2008/8/13, Romi Kuntsman [EMAIL PROTECTED]: 1. gif2swf doesn't handle the animated frames correctly, erasing the entire image instead of just changed places. see attached example. 2. backgroundcolor doesn't work, you

[Swftools-common] clickable swf from gif/png/jpg

2008-08-03 Thread Romi Kuntsman
Hello, I'm using png2swf, gif2swf, jpg2swf to convert images to SWFs. How can I make them clickable - so a click would lead to the standard clickTAG url, or a predefined URL? Thanks, -- Romi Kuntsman | High Performance Software Engineer RockeTier - Startup your engines | http