date:20151210

JIRA: Wrong dates from imported JIRAs

2015-12-10 Thread Lars Francke

Hi,

I've been digging into JIRA a bit and found a couple of old issues (~250)
and I just assume that they are all from the old JIRA.

Here's one example:

Old: 
New: 

created": "0012-08-21T09:03:00.000-0800",

That's quite impressive but wrong :)

That means when you sort all Apache JIRAs by creation date Spark comes
first: <
https://issues.apache.org/jira/issues/?jql=order%20By%20createdDate%20ASC&startIndex=250
>

The dates were already wrong in the source JIRA.

Now it seems as if those can be fixed using a CSV import. I still remember
how painful the initial import was but this looks relatively straight
forward <
https://confluence.atlassian.com/display/JIRAKB/How+to+change+the+issue+creation+date+using+CSV+import
>

If everyone's okay with it I'd raise it with INFRA (and would prepare the
necessary CSV file) but as I'm not a committer it'd be great if one/some of
the committers could give me a +1

Cheers,
Lars

coalesce at DataFrame missing argument for shuffle.

2015-12-10 Thread Hyukjin Kwon

Hi all,

I accidentally met coalesce() function and found this taking arguments
different for RDD and DataFrame.

It looks shuffle option is missing for DataFrame.

I understand repartition() exactly works as coalesce() with shuffling but
it looks a bit weird that the same functions take different argument which
can be easily done just by adding single argument.

Could anybody tell me if this is intendedly missing or not?

Thanks!

RE: Specifying Scala types when calling methods from SparkR

2015-12-10 Thread Sun, Rui

Hi, Chris,

I know your point: objectFile and saveAsObjectFile pair in SparkR can only be 
used in SparkR context, as the content of RDD is assumed to be serialized R 
objects.

It’s fine to drop down to JVM level in the case the model is saved as 
objectFile in Scala, and load it in SparkR. But I don’t understand “but that 
seems to only work if you specify the type”, seems no need to specify type 
because of type erasure?

Did you try something like: convert the RDD to DataFrame, save it , and load it 
as a DataFrame in SparkR and then to RDD?

From: Chris Freeman [mailto:cfree...@alteryx.com]
Sent: Friday, December 11, 2015 2:47 AM
To: Sun, Rui; shiva...@eecs.berkeley.edu
Cc: dev@spark.apache.org
Subject: RE: Specifying Scala types when calling methods from SparkR

Hi Sun Rui,

I’ve had some luck simply using “objectFile” when saving from SparkR directly. 
The problem is that if you do it that way, the model object will only work if 
you continue to use the current Spark Context, and I think model persistence 
should really enable you to use the model at a later time. That’s where I found 
that I could drop down to the JVM level and interact with the Scala object 
directly, but that seems to only work if you specify the type.

On December 9, 2015 at 7:59:43 PM, Sun, Rui 
(rui@intel.com) wrote:
Hi,

Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. 
You can take the objectFile() in context.R as example.

Since the SparkContext created in SparkR is actually a JavaSparkContext, there 
is no need to pass the implicit ClassTag.

-Original Message-
From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
Sent: Thursday, December 10, 2015 8:21 AM
To: Chris Freeman
Cc: dev@spark.apache.org
Subject: Re: Specifying Scala types when calling methods from SparkR

The SparkR callJMethod can only invoke methods as they show up in the Java byte 
code. So in this case you'll need to check the SparkContext byte code (with 
javap or something like that) to see how that method looks. My guess is the 
type is passed in as a class tag argument, so you'll need to do something like 
create a class tag for the LinearRegressionModel and pass that in as the first 
or last argument etc.

Thanks
Shivaram

On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman 
mailto:cfree...@alteryx.com>> wrote:
> Hey everyone,
>
> I’m currently looking at ways to save out SparkML model objects from
> SparkR and I’ve had some luck putting the model into an RDD and then
> saving the RDD as an Object File. Once it’s saved, I’m able to load it
> back in with something like:
>
> sc.objectFile[LinearRegressionModel](“path/to/model”)
>
> I’d like to try and replicate this same process from SparkR using the
> JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able
> to replicate my success and I’m guessing that it’s (at least in part)
> due to the necessity of specifying the type when calling the objectFile 
> method.
>
> Does anyone know if this is actually possible? For example, here’s
> what I’ve come up with so far:
>
> loadModel <- function(sc, modelPath) {
> modelRDD <- SparkR:::callJMethod(sc,
>
> "objectFile[PipelineModel]",
> modelPath,
> SparkR:::getMinPartitions(sc, NULL))
> return(modelRDD)
> }
>
> Any help is appreciated!
>
> --
> Chris Freeman
>

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.org For 
additional commands, e-mail: 
dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-10 Thread Michael Armbrust

Cutting RC2 now.

On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust 
wrote:

> We are getting close to merging patches for SPARK-12155
>  and SPARK-12253
> .  I'll be cutting RC2
> shortly after that.
>
> Michael
>
> On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust 
> wrote:
>
>> An update: the vote fails due to the -1.   I'll post another RC as soon
>> as we've resolved these issues.  In the mean time I encourage people to
>> continue testing and post any problems they encounter here.
>>
>> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai  wrote:
>>
>>> -1
>>>
>>> Tow blocker bugs have been found after this RC.
>>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data
>>> corruption when an external sorter spills data.
>>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks
>>> from acquiring memory even when the executor indeed can allocate memory by
>>> evicting storage memory.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We
>>> are still working on https://issues.apache.org/jira/browse/SPARK-12155.
>>>
>>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra 
>>> wrote:
>>>
 0

 Currently figuring out who is responsible for the regression that I am
 seeing in some user code ScalaUDFs that make use of Timestamps and where
 NULL from a CSV file read in via a TestHive#registerTestTable is now
 producing 1969-12-31 23:59:59.99 instead of null.

 On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen  wrote:

> Licenses and signature are all fine.
>
> Docker integration tests consistently fail for me with Java 7 / Ubuntu
> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"
>
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError:
>
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>   at
> org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
>   at
> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>   at
> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>   at
> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>   at
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>   at
> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>   at
> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>   at
> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>   at
> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>   at
> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>
> I also get this failure consistently:
>
> DirectKafkaStreamSuite
> - offset recovery *** FAILED ***
>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>
> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>
> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>
> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
> was false Recovered ranges are not the same as the ones generated
> (DirectKafkaStreamSuite.scala:301)
>
> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <
> mich...@databricks.com> wrote:
> > Please vote on releasing the following candidate as Apache Spark
> version
> > 1.6.0!
> >
> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and
> passes if
> > a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.6.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> http://spark.apache.org/
> >
> > The tag to be voted on is v1.6.0-rc1
> > (bf525845cef159d2d4c9f4d64e158f037179b5c4)
> >
> > The release files, including signatures, digests, etc. can be found
> at:
> >
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachespark-1165/
> >
> > The test repository (versioned as v1.6.0-rc1) for this release can
> be found
> > at:
> >
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-10 Thread Michael Armbrust

We are getting close to merging patches for SPARK-12155
 and SPARK-12253
.  I'll be cutting RC2
shortly after that.

Michael

On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust 
wrote:

> An update: the vote fails due to the -1.   I'll post another RC as soon as
> we've resolved these issues.  In the mean time I encourage people to
> continue testing and post any problems they encounter here.
>
> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai  wrote:
>
>> -1
>>
>> Tow blocker bugs have been found after this RC.
>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data
>> corruption when an external sorter spills data.
>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks from
>> acquiring memory even when the executor indeed can allocate memory by
>> evicting storage memory.
>>
>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We are
>> still working on https://issues.apache.org/jira/browse/SPARK-12155.
>>
>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra 
>> wrote:
>>
>>> 0
>>>
>>> Currently figuring out who is responsible for the regression that I am
>>> seeing in some user code ScalaUDFs that make use of Timestamps and where
>>> NULL from a CSV file read in via a TestHive#registerTestTable is now
>>> producing 1969-12-31 23:59:59.99 instead of null.
>>>
>>> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen  wrote:
>>>
 Licenses and signature are all fine.

 Docker integration tests consistently fail for me with Java 7 / Ubuntu
 and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"

 *** RUN ABORTED ***
   java.lang.NoSuchMethodError:

 org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
   at
 org.glassfish.jersey.apache.connector.ApacheConnector.(ApacheConnector.java:240)
   at
 org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
   at
 org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
   at
 org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
   at
 org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
   at
 org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
   at
 org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
   at
 org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
   at
 org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
   at
 org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)

 I also get this failure consistently:

 DirectKafkaStreamSuite
 - offset recovery *** FAILED ***
   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
 Array[org.apache.spark.streaming.kafka.OffsetRange])) =>

 earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,

 scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,

 scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
 was false Recovered ranges are not the same as the ones generated
 (DirectKafkaStreamSuite.scala:301)

 On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <
 mich...@databricks.com> wrote:
 > Please vote on releasing the following candidate as Apache Spark
 version
 > 1.6.0!
 >
 > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and
 passes if
 > a majority of at least 3 +1 PMC votes are cast.
 >
 > [ ] +1 Release this package as Apache Spark 1.6.0
 > [ ] -1 Do not release this package because ...
 >
 > To learn more about Apache Spark, please see http://spark.apache.org/
 >
 > The tag to be voted on is v1.6.0-rc1
 > (bf525845cef159d2d4c9f4d64e158f037179b5c4)
 >
 > The release files, including signatures, digests, etc. can be found
 at:
 >
 http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
 >
 > Release artifacts are signed with the following key:
 > https://people.apache.org/keys/committer/pwendell.asc
 >
 > The staging repository for this release can be found at:
 >
 https://repository.apache.org/content/repositories/orgapachespark-1165/
 >
 > The test repository (versioned as v1.6.0-rc1) for this release can be
 found
 > at:
 >
 https://repository.apache.org/content/repositories/orgapachespark-1164/
 >
 > The documentation corresponding to this release can be found at:
 >
 http://people.apache.org/~pwendell/spark-re

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Jacek Laskowski

On Thu, Dec 10, 2015 at 8:10 PM, Shixiong Zhu  wrote:
> Jacek, could you create a JIRA for it? I just reproduced it. It's a bug in
> how Master handles the Worker disconnection.

Hi Shixiong,

I'm saved. Kept thinking I'm lost in the sources and see ghosts :-)

https://issues.apache.org/jira/browse/SPARK-12267

Pozdrawiam,
Jacek

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Shixiong Zhu

Jacek, could you create a JIRA for it? I just reproduced it. It's a bug in
how Master handles the Worker disconnection.

Best Regards,
Shixiong Zhu

2015-12-10 2:45 GMT-08:00 Jacek Laskowski :

> Hi,
>
> I'm on yesterday's master HEAD.
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> http://blog.jaceklaskowski.pl
> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Thu, Dec 10, 2015 at 9:50 AM, Sasaki Kai 
> wrote:
> > Hi, Jacek
> >
> > What version of Spark do you use?
> > I started sbin/start-master.sh script as you did against master HEAD.
> But there is no warning log such you pasted.
> > While you can specify hostname with -h option, you can also omit it. The
> master name can be set automatically with
> > the name `hostname` command. You can also try it.
> >
> > Kai Sasaki
> >
> >> On Dec 10, 2015, at 5:22 PM, Jacek Laskowski  wrote:
> >>
> >> Hi,
> >>
> >> While toying with Spark Standalone I've noticed the following messages
> >> in the logs of the master:
> >>
> >> INFO Master: Registering worker 192.168.1.6:59919 with 2 cores, 2.0 GB
> RAM
> >> INFO Master: localhost:59920 got disassociated, removing it.
> >> ...
> >> WARN Master: Removing worker-20151210090708-192.168.1.6-59919 because
> >> we got no heartbeat in 60 seconds
> >> INFO Master: Removing worker worker-20151210090708-192.168.1.6-59919
> >> on 192.168.1.6:59919
> >>
> >> Why does the message "WARN Master: Removing
> >> worker-20151210090708-192.168.1.6-59919 because we got no heartbeat in
> >> 60 seconds" appear when the worker should've been gone already (as
> >> pointed out in "INFO Master: localhost:59920 got disassociated,
> >> removing it.")?
> >>
> >> Could it be that the ids are different - 192.168.1.6:59919 vs
> localhost:59920?
> >>
> >> I started master using "./sbin/start-master.sh -h localhost" and the
> >> workers "./sbin/start-slave.sh spark://localhost:7077".
> >>
> >> p.s. Are such questions appropriate for this mailing list?
> >>
> >> Pozdrawiam,
> >> Jacek
> >>
> >> --
> >> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> >> http://blog.jaceklaskowski.pl
> >> Mastering Spark
> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> >> Follow me at https://twitter.com/jaceklaskowski
> >> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

RE: Specifying Scala types when calling methods from SparkR

2015-12-10 Thread Chris Freeman

Hi Sun Rui,

I’ve had some luck simply using “objectFile” when saving from SparkR directly. 
The problem is that if you do it that way, the model object will only work if 
you continue to use the current Spark Context, and I think model persistence 
should really enable you to use the model at a later time. That’s where I found 
that I could drop down to the JVM level and interact with the Scala object 
directly, but that seems to only work if you specify the type.

On December 9, 2015 at 7:59:43 PM, Sun, Rui 
(rui@intel.com) wrote:

Hi,

Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. 
You can take the objectFile() in context.R as example.

Since the SparkContext created in SparkR is actually a JavaSparkContext, there 
is no need to pass the implicit ClassTag.

-Original Message-
From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
Sent: Thursday, December 10, 2015 8:21 AM
To: Chris Freeman
Cc: dev@spark.apache.org
Subject: Re: Specifying Scala types when calling methods from SparkR

The SparkR callJMethod can only invoke methods as they show up in the Java byte 
code. So in this case you'll need to check the SparkContext byte code (with 
javap or something like that) to see how that method looks. My guess is the 
type is passed in as a class tag argument, so you'll need to do something like 
create a class tag for the LinearRegressionModel and pass that in as the first 
or last argument etc.

Thanks
Shivaram

On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman  wrote:
> Hey everyone,
>
> I’m currently looking at ways to save out SparkML model objects from
> SparkR and I’ve had some luck putting the model into an RDD and then
> saving the RDD as an Object File. Once it’s saved, I’m able to load it
> back in with something like:
>
> sc.objectFile[LinearRegressionModel](“path/to/model”)
>
> I’d like to try and replicate this same process from SparkR using the
> JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able
> to replicate my success and I’m guessing that it’s (at least in part)
> due to the necessity of specifying the type when calling the objectFile 
> method.
>
> Does anyone know if this is actually possible? For example, here’s
> what I’ve come up with so far:
>
> loadModel <- function(sc, modelPath) {
> modelRDD <- SparkR:::callJMethod(sc,
>
> "objectFile[PipelineModel]",
> modelPath,
> SparkR:::getMinPartitions(sc, NULL))
> return(modelRDD)
> }
>
> Any help is appreciated!
>
> --
> Chris Freeman
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Bryan Cutler

Hi Jacek,

I also recently noticed those messages, and some others, and am wondering
if there is an issue.  I am also seeing the following when I have event
logging enabled.  The first application is submitted and executes fine, but
all subsequent attempts produce an error log, but the master fails to load
it.  Not sure if this is related to the messages you see, but I would also
like to know if others can reproduce.  Here are the logs

MASTER
15/12/09 21:19:10 INFO Master: Registering app Spark Pi
15/12/09 21:19:10 INFO Master: Registered app Spark Pi with ID
app-20151209211910-0001
15/12/09 21:19:10 INFO Master: Launching executor app-20151209211910-0001/0
on worker worker-20151209211739-***
15/12/09 21:19:14 INFO Master: Received unregister request from application
app-20151209211910-0001
15/12/09 21:19:14 INFO Master: Removing app app-20151209211910-0001
15/12/09 21:19:14 WARN Master: Application Spark Pi is still in progress,
it may be terminated abnormally.
15/12/09 21:19:14 WARN Master: No event logs found for application Spark Pi
in file:/home/bryan/git/spark/logs/.
15/12/09 21:19:14 INFO Master: localhost.localdomain:54174 got
disassociated, removing it.
15/12/09 21:19:14 WARN Master: Got status update for unknown executor
app-20151209211910-0001/0
15/12/09 21:21:59 WARN Master: Got status update for unknown executor
app-20151209211830-/0
15/12/09 21:22:00 INFO Master: localhost.localdomain:54163 got
disassociated, removing it.

WORKER
15/12/09 21:19:14 INFO Worker: Asked to kill executor
app-20151209211910-0001/0
15/12/09 21:19:14 INFO ExecutorRunner: Runner thread for executor
app-20151209211910-0001/0 interrupted
15/12/09 21:19:14 INFO ExecutorRunner: Killing process!
15/12/09 21:19:14 ERROR FileAppender: Error writing stream to file
/home/bryan/git/spark/work/app-20151209211910-0001/0/stderr
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1730)
at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
15/12/09 21:19:14 INFO Worker: Executor app-20151209211910-0001/0 finished
with state KILLED exitStatus 143
15/12/09 21:19:14 INFO Worker: Cleaning up local directories for
application app-20151209211910-0001
15/12/09 21:19:14 INFO ExternalShuffleBlockResolver: Application
app-20151209211910-0001 removed, cleanupLocalDirs = true

On Thu, Dec 10, 2015 at 2:45 AM, Jacek Laskowski  wrote:

> Hi,
>
> I'm on yesterday's master HEAD.
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> http://blog.jaceklaskowski.pl
> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Thu, Dec 10, 2015 at 9:50 AM, Sasaki Kai 
> wrote:
> > Hi, Jacek
> >
> > What version of Spark do you use?
> > I started sbin/start-master.sh script as you did against master HEAD.
> But there is no warning log such you pasted.
> > While you can specify hostname with -h option, you can also omit it. The
> master name can be set automatically with
> > the name `hostname` command. You can also try it.
> >
> > Kai Sasaki
> >
> >> On Dec 10, 2015, at 5:22 PM, Jacek Laskowski  wrote:
> >>
> >> Hi,
> >>
> >> While toying with Spark Standalone I've noticed the following messages
> >> in the logs of the master:
> >>
> >> INFO Master: Registering worker 192.168.1.6:59919 with 2 cores, 2.0 GB
> RAM
> >> INFO Master: localhost:59920 got disassociated, removing it.
> >> ...
> >> WARN Master: Removing worker-20151210090708-192.168.1.6-59919 because
> >> we got no heartbeat in 60 seconds
> >> INFO Master: Removing worker worker-20151210090708-192.168.1.6-59919
> >> on 192.168.1.6:59919
> >>
> >> Why does the message "WARN Master: Removing
> >> worker-20151210090708-192.168.1.6-59919 because we got no heartbeat in
> >> 60 seconds" appear when the worker should've been gone already (as
> >> pointed out in "INFO Master: localhost:59920 got disassociated,
> >> removing it.")?
> >>
> >> Could it be that the ids are different - 192.168.1.6:59919 vs
> localhost:59920?
> >>
> >> I started master using "./sbin/start-master.sh -h localhost" and the
> >> workers "./sbin/start-slave.sh spark://localhost:7077".
> >>
> >> p.s. Are such questions appropriate

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread shane knapp

that probably was me, sorry.  i pulled up the rest api command on my
phone before i fell asleep and must have accidentally put jenkins in
to quiet mode.  sorry about that!

On Thu, Dec 10, 2015 at 3:49 AM, Cheng Lian  wrote:
> Hi Shane,
>
> I found that Jenkins has been in the status of "Jenkins is going to shut
> down" for at least 4 hours (from ~23:30 Dec 9 to 3:45 Dec 10, PDT). Not sure
> whether this is part of the schedule or related?
>
> Cheng
>
> On Thu, Dec 10, 2015 at 3:56 AM, shane knapp  wrote:
>>
>> here's the security advisory for the update:
>>
>> https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-12-09
>>
>> On Wed, Dec 9, 2015 at 9:55 AM, shane knapp  wrote:
>> > reminder!  this is happening tomorrow morning.
>> >
>> > On Wed, Dec 2, 2015 at 7:20 PM, shane knapp  wrote:
>> >> there's Yet Another Jenkins Security Advisory[tm], and a big release
>> >> to patch it all coming out next wednesday.
>> >>
>> >> to that end i will be performing a jenkins update, as well as
>> >> performing the work to resolve the following jira issue:
>> >> https://issues.apache.org/jira/browse/SPARK-11255
>> >>
>> >> i will put jenkins in to quiet mode around 6am, start work around 7am
>> >> and expect everything to be back up and building before 9am.  i'll
>> >> post updates as things progress.
>> >>
>> >> please let me know ASAP if there's any problem with this schedule.
>> >>
>> >> shane
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Spark Streaming Kinesis - DynamoDB Streams compatability

2015-12-10 Thread Nick Pentreath

Hi Spark users & devs

I was just wondering if anyone out there has interest in DynamoDB Streams (
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html)
as an input source for Spark Streaming Kinesis?

Because DynamoDB Streams provides an adaptor client that works with the
KCL, making this work is fairly straightforward, but would require a little
bit of work to add it to Spark Streaming Kinesis as an option. It also
requires updating the AWS SDK version.

For those using AWS heavily, there are other ways of achieving the same
outcome indirectly, the easiest of which I've found so far is using AWS
Lambdas to read from the DynamoDB Stream, (optionally) transform the
events, and write to a Kinesis stream, allowing one to just use the
existing Spark integration. Still, I'd like to know if there is sufficient
interest or demand for this among the user base to work on a PR adding
DynamoDB Streams support to Spark.

(At the same time, the implementation details happen to provide an
opportunity to address https://issues.apache.org/jira/browse/SPARK-10969,
though not sure how much need there is for that either?)

N

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread shane knapp

and we're done!  this was a quick one.  :)

On Thu, Dec 10, 2015 at 6:54 AM, shane knapp  wrote:
> jenkins is done, but we'll also be updating the firewall.  this
> shouldn't take very long and i'll let everyone know when we're done.
>
> On Thu, Dec 10, 2015 at 6:35 AM, shane knapp  wrote:
>> this is happening now.
>>
>> On Wed, Dec 9, 2015 at 11:56 AM, shane knapp  wrote:
>>> here's the security advisory for the update:
>>> https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-12-09
>>>
>>> On Wed, Dec 9, 2015 at 9:55 AM, shane knapp  wrote:
 reminder!  this is happening tomorrow morning.

 On Wed, Dec 2, 2015 at 7:20 PM, shane knapp  wrote:
> there's Yet Another Jenkins Security Advisory[tm], and a big release
> to patch it all coming out next wednesday.
>
> to that end i will be performing a jenkins update, as well as
> performing the work to resolve the following jira issue:
> https://issues.apache.org/jira/browse/SPARK-11255
>
> i will put jenkins in to quiet mode around 6am, start work around 7am
> and expect everything to be back up and building before 9am.  i'll
> post updates as things progress.
>
> please let me know ASAP if there's any problem with this schedule.
>
> shane

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread shane knapp

jenkins is done, but we'll also be updating the firewall.  this
shouldn't take very long and i'll let everyone know when we're done.

On Thu, Dec 10, 2015 at 6:35 AM, shane knapp  wrote:
> this is happening now.
>
> On Wed, Dec 9, 2015 at 11:56 AM, shane knapp  wrote:
>> here's the security advisory for the update:
>> https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-12-09
>>
>> On Wed, Dec 9, 2015 at 9:55 AM, shane knapp  wrote:
>>> reminder!  this is happening tomorrow morning.
>>>
>>> On Wed, Dec 2, 2015 at 7:20 PM, shane knapp  wrote:
 there's Yet Another Jenkins Security Advisory[tm], and a big release
 to patch it all coming out next wednesday.

 to that end i will be performing a jenkins update, as well as
 performing the work to resolve the following jira issue:
 https://issues.apache.org/jira/browse/SPARK-11255

 i will put jenkins in to quiet mode around 6am, start work around 7am
 and expect everything to be back up and building before 9am.  i'll
 post updates as things progress.

 please let me know ASAP if there's any problem with this schedule.

 shane

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread shane knapp

this is happening now.

On Wed, Dec 9, 2015 at 11:56 AM, shane knapp  wrote:
> here's the security advisory for the update:
> https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-12-09
>
> On Wed, Dec 9, 2015 at 9:55 AM, shane knapp  wrote:
>> reminder!  this is happening tomorrow morning.
>>
>> On Wed, Dec 2, 2015 at 7:20 PM, shane knapp  wrote:
>>> there's Yet Another Jenkins Security Advisory[tm], and a big release
>>> to patch it all coming out next wednesday.
>>>
>>> to that end i will be performing a jenkins update, as well as
>>> performing the work to resolve the following jira issue:
>>> https://issues.apache.org/jira/browse/SPARK-11255
>>>
>>> i will put jenkins in to quiet mode around 6am, start work around 7am
>>> and expect everything to be back up and building before 9am.  i'll
>>> post updates as things progress.
>>>
>>> please let me know ASAP if there's any problem with this schedule.
>>>
>>> shane

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Does RDD[Type1, Iterable[Type2]] split into multiple partitions?

2015-12-10 Thread Reynold Xin

No, since the signature itself limits it.


On Thu, Dec 10, 2015 at 9:19 PM, JaeSung Jun  wrote:

> Hi,
>
> I'm currently working on Iterable type of RDD, which is like :
>
> val keyValueIterableRDD[CaseClass1, Iterable[CaseClass2]] = buildRDD(...)
>
> If there is only one unique key and Iterable is big enough, would this
> Iterable be partitioned across all executors like followings ?
>
> (executor1)
> (xxx, iterator from 0 to 10,000)
>
> (executor2)
> (xxx, iterator from 10,001 to 20,000)
>
> (executor2)
> (xxx, iterator from 20,001 to 30,000)
>
> ...
>
> Thanks
> Jason
>
>

Does RDD[Type1, Iterable[Type2]] split into multiple partitions?

2015-12-10 Thread JaeSung Jun

Hi,

I'm currently working on Iterable type of RDD, which is like :

val keyValueIterableRDD[CaseClass1, Iterable[CaseClass2]] = buildRDD(...)

If there is only one unique key and Iterable is big enough, would this
Iterable be partitioned across all executors like followings ?

(executor1)
(xxx, iterator from 0 to 10,000)

(executor2)
(xxx, iterator from 10,001 to 20,000)

(executor2)
(xxx, iterator from 20,001 to 30,000)

...

Thanks
Jason

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread Cheng Lian

Hi Shane,

I found that Jenkins has been in the status of "Jenkins is going to shut
down" for at least 4 hours (from ~23:30 Dec 9 to 3:45 Dec 10, PDT). Not
sure whether this is part of the schedule or related?

Cheng

On Thu, Dec 10, 2015 at 3:56 AM, shane knapp  wrote:

> here's the security advisory for the update:
>
> https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2015-12-09
>
> On Wed, Dec 9, 2015 at 9:55 AM, shane knapp  wrote:
> > reminder!  this is happening tomorrow morning.
> >
> > On Wed, Dec 2, 2015 at 7:20 PM, shane knapp  wrote:
> >> there's Yet Another Jenkins Security Advisory[tm], and a big release
> >> to patch it all coming out next wednesday.
> >>
> >> to that end i will be performing a jenkins update, as well as
> >> performing the work to resolve the following jira issue:
> >> https://issues.apache.org/jira/browse/SPARK-11255
> >>
> >> i will put jenkins in to quiet mode around 6am, start work around 7am
> >> and expect everything to be back up and building before 9am.  i'll
> >> post updates as things progress.
> >>
> >> please let me know ASAP if there's any problem with this schedule.
> >>
> >> shane
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: releasing Spark 1.4.2

2015-12-10 Thread Inosh Goonewardena

Hi,

I can see that there is 1.4.2 rc release prepared already[1], but wasn't
called the vote for some reason. Was there any particular reason behind for
stop the voting process? Just wondering whether we can get 1.4.2 released?

[1]
https://github.com/apache/spark/commit/0b22a3c7a3a40ff63a2e740ecab152141271b30d

On Mon, Nov 16, 2015 at 2:21 PM, Ted Yu  wrote:

> See this thread:
>
>
> http://search-hadoop.com/m/q3RTtLKc2ctNPcq&subj=Re+Spark+1+4+2+release+and+votes+conversation+
>
> On Nov 15, 2015, at 10:53 PM, Niranda Perera 
> wrote:
>
> Hi,
>
> I am wondering when spark 1.4.2 will be released?
>
> is it in the voting stage at the moment?
>
> rgds
>
> --
> Niranda
> @n1r44 
> +94-71-554-8430
> https://pythagoreanscript.wordpress.com/
>
>

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Jacek Laskowski

Hi,

I'm on yesterday's master HEAD.

Pozdrawiam,
Jacek

--
Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
http://blog.jaceklaskowski.pl
Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski


On Thu, Dec 10, 2015 at 9:50 AM, Sasaki Kai  wrote:
> Hi, Jacek
>
> What version of Spark do you use?
> I started sbin/start-master.sh script as you did against master HEAD. But 
> there is no warning log such you pasted.
> While you can specify hostname with -h option, you can also omit it. The 
> master name can be set automatically with
> the name `hostname` command. You can also try it.
>
> Kai Sasaki
>
>> On Dec 10, 2015, at 5:22 PM, Jacek Laskowski  wrote:
>>
>> Hi,
>>
>> While toying with Spark Standalone I've noticed the following messages
>> in the logs of the master:
>>
>> INFO Master: Registering worker 192.168.1.6:59919 with 2 cores, 2.0 GB RAM
>> INFO Master: localhost:59920 got disassociated, removing it.
>> ...
>> WARN Master: Removing worker-20151210090708-192.168.1.6-59919 because
>> we got no heartbeat in 60 seconds
>> INFO Master: Removing worker worker-20151210090708-192.168.1.6-59919
>> on 192.168.1.6:59919
>>
>> Why does the message "WARN Master: Removing
>> worker-20151210090708-192.168.1.6-59919 because we got no heartbeat in
>> 60 seconds" appear when the worker should've been gone already (as
>> pointed out in "INFO Master: localhost:59920 got disassociated,
>> removing it.")?
>>
>> Could it be that the ids are different - 192.168.1.6:59919 vs 
>> localhost:59920?
>>
>> I started master using "./sbin/start-master.sh -h localhost" and the
>> workers "./sbin/start-slave.sh spark://localhost:7077".
>>
>> p.s. Are such questions appropriate for this mailing list?
>>
>> Pozdrawiam,
>> Jacek
>>
>> --
>> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
>> http://blog.jaceklaskowski.pl
>> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
>> Follow me at https://twitter.com/jaceklaskowski
>> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

2015-12-10 Thread Steve Loughran

Note also that ASF Jira goes down at 19:00 UTC on friday, it *should* be back 
up ... the @infrabot twitter feed will the best channel for updates

-- Forwarded message --
From: Daniel Takamori 
Date: 8 December 2015 at 18:59
Subject: Jira reboot 11-12-2015 at 19:00 UTC
To: operati...@apache.org, infrastruct...@apache.org

There will be a planned reboot of Jira on Friday 11th December at 19:00 UTC.

This is 72 hours notice as recommended in our Core Services planned downtime 
SLA.

Currently, Jira requires a reboot when adding new projects to it. There is an 
outstanding ticket with Atlassian about this. They require logs and so these 
will be gathered at the time of the planned reboot. 

Projects being added to Jira at this time will include:-

INFRA-10905 - New JIRA for Metron

and any more that get requested between now and downtime.

Any projects requiring issues to be imported from other issue trackers will NOT 
be done at this time.

A tweet via @infrabot will be tweeted 24 hrs and 1 hr before.
A planned maintenance notice will be posted on status.apache.org.

Actual downtime should be no more than 10 minutes all being well.

The next email about this will be after the service has resumed from the 
planned downtime.

Thanks!

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Jacek Laskowski

Hi,

While toying with Spark Standalone I've noticed the following messages
in the logs of the master:

INFO Master: Registering worker 192.168.1.6:59919 with 2 cores, 2.0 GB RAM
INFO Master: localhost:59920 got disassociated, removing it.
...
WARN Master: Removing worker-20151210090708-192.168.1.6-59919 because
we got no heartbeat in 60 seconds
INFO Master: Removing worker worker-20151210090708-192.168.1.6-59919
on 192.168.1.6:59919

Why does the message "WARN Master: Removing
worker-20151210090708-192.168.1.6-59919 because we got no heartbeat in
60 seconds" appear when the worker should've been gone already (as
pointed out in "INFO Master: localhost:59920 got disassociated,
removing it.")?

Could it be that the ids are different - 192.168.1.6:59919 vs localhost:59920?

I started master using "./sbin/start-master.sh -h localhost" and the
workers "./sbin/start-slave.sh spark://localhost:7077".

p.s. Are such questions appropriate for this mailing list?

Pozdrawiam,
Jacek

--
Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
http://blog.jaceklaskowski.pl
Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

JIRA: Wrong dates from imported JIRAs

coalesce at DataFrame missing argument for shuffle.

RE: Specifying Scala types when calling methods from SparkR

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Re: A bug in Spark standalone? Worker registration and deregistration

Re: A bug in Spark standalone? Worker registration and deregistration

RE: Specifying Scala types when calling methods from SparkR

Re: A bug in Spark standalone? Worker registration and deregistration

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

Spark Streaming Kinesis - DynamoDB Streams compatability

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

Re: Does RDD[Type1, Iterable[Type2]] split into multiple partitions?

Does RDD[Type1, Iterable[Type2]] split into multiple partitions?

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

Re: releasing Spark 1.4.2

Re: A bug in Spark standalone? Worker registration and deregistration

Re: [build system] jenkins downtime, thursday 12/10/15 7am PDT

A bug in Spark standalone? Worker registration and deregistration

21 matches

Site Navigation

Mail list logo

Footer information