Re: rdd.distinct with Partitioner

2016-06-08 Thread 汪洋
Frankly speaking, I think reduceByKey with Partitioner has the same problem too 
and it should not be exposed to public user either. Because it is a little hard 
to fully understand how the partitioner behaves without looking at the actual 
code.  

And if there exits a basic contract of a Partitioner, maybe it should be stated 
explicitly in the document if not enforced by code.

However, I don’t feel too strong to argue about this issue except stating my 
concern. It will not cause too much trouble anyway once users learn the 
semantics. Just a judgement call by the API designer.


> 在 2016年6月9日,下午12:51,Alexander Pivovarov  写道:
> 
> reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result 
> 
> Why reduceByKey with Partitioner exists then?
> 
> On Wed, Jun 8, 2016 at 9:22 PM, 汪洋  > wrote:
> Hi Alexander,
> 
> I think it does not guarantee to be right if an arbitrary Partitioner is 
> passed in.
> 
> I have created a notebook and you can check it out. 
> (https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366/latest.html
>  
> )
> 
> Best regards,
> 
> Yang
> 
> 
>> 在 2016年6月9日,上午11:42,Alexander Pivovarov > > 写道:
>> 
>> most of the RDD methods which shuffle data take Partitioner as a parameter
>> 
>> But rdd.distinct does not have such signature
>> 
>> Should I open a PR for that?
>> 
>> /**
>>  * Return a new RDD containing the distinct elements in this RDD.
>>  */
>> def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): 
>> RDD[T] = withScope {
>>   map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
>> }
> 
> 



DAG in Pipeline

2016-06-08 Thread Pranay Tonpay
Hi,
Pipeline as of now seems to be having a series of transformers and
estimators in a serial fashion.
Is it possible to create a DAG sort of thing -
Eg -
Two transformers running in parallel to cleanse data (a custom built
Transformer)  in some way and then their outputs ( two outputs ) used for
some sort of correlation ( another custom built Transformer )

Let me know -

thx
pranay


Re: rdd.distinct with Partitioner

2016-06-08 Thread Alexander Pivovarov
reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result

Why reduceByKey with Partitioner exists then?

On Wed, Jun 8, 2016 at 9:22 PM, 汪洋  wrote:

> Hi Alexander,
>
> I think it does not guarantee to be right if an arbitrary Partitioner is
> passed in.
>
> I have created a notebook and you can check it out. (
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366/latest.html
> )
>
> Best regards,
>
> Yang
>
>
> 在 2016年6月9日,上午11:42,Alexander Pivovarov  写道:
>
> most of the RDD methods which shuffle data take Partitioner as a parameter
>
> But rdd.distinct does not have such signature
>
> Should I open a PR for that?
>
> /**
>  * Return a new RDD containing the distinct elements in this RDD.
>  */
>
> def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): 
> RDD[T] = withScope {
>   map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
> }
>
>
>


Re: rdd.distinct with Partitioner

2016-06-08 Thread Mridul Muralidharan
The example violates the basic contract of a Partitioner.
It does make sense to take Partitioner as a param to distinct - though it
is fairly trivial to simulate that in user code as well ...

Regards
Mridul


On Wednesday, June 8, 2016, 汪洋  wrote:

> Hi Alexander,
>
> I think it does not guarantee to be right if an arbitrary Partitioner is
> passed in.
>
> I have created a notebook and you can check it out. (
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366/latest.html
> )
>
> Best regards,
>
> Yang
>
>
> 在 2016年6月9日,上午11:42,Alexander Pivovarov  > 写道:
>
> most of the RDD methods which shuffle data take Partitioner as a parameter
>
> But rdd.distinct does not have such signature
>
> Should I open a PR for that?
>
> /**
>  * Return a new RDD containing the distinct elements in this RDD.
>  */
>
> def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): 
> RDD[T] = withScope {
>   map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
> }
>
>
>


Re: rdd.distinct with Partitioner

2016-06-08 Thread 汪洋
Hi Alexander,

I think it does not guarantee to be right if an arbitrary Partitioner is passed 
in.

I have created a notebook and you can check it out. 
(https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366/latest.html
 
)

Best regards,

Yang


> 在 2016年6月9日,上午11:42,Alexander Pivovarov  写道:
> 
> most of the RDD methods which shuffle data take Partitioner as a parameter
> 
> But rdd.distinct does not have such signature
> 
> Should I open a PR for that?
> 
> /**
>  * Return a new RDD containing the distinct elements in this RDD.
>  */
> def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): 
> RDD[T] = withScope {
>   map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
> }



rdd.distinct with Partitioner

2016-06-08 Thread Alexander Pivovarov
most of the RDD methods which shuffle data take Partitioner as a parameter

But rdd.distinct does not have such signature

Should I open a PR for that?

/**
 * Return a new RDD containing the distinct elements in this RDD.
 */

def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] =
null): RDD[T] = withScope {
  map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
}


Re: Kryo registration for Tuples?

2016-06-08 Thread Reynold Xin
Yes you can :)


On Wed, Jun 8, 2016 at 6:00 PM, Alexander Pivovarov 
wrote:

> Can I just enable spark.kryo.registrationRequired and look at error
> messages to get unregistered classes?
>
> On Wed, Jun 8, 2016 at 5:52 PM, Reynold Xin  wrote:
>
>> Due to type erasure they have no difference, although watch out for Scala
>> tuple serialization.
>>
>>
>> On Wednesday, June 8, 2016, Ted Yu  wrote:
>>
>>> I think the second group (3 classOf's) should be used.
>>>
>>> Cheers
>>>
>>> On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov <
>>> apivova...@gmail.com> wrote:
>>>
 if my RDD is RDD[(String, (Long, MyClass))]

 Do I need to register

 classOf[MyClass]
 classOf[(Any, Any)]

 or

 classOf[MyClass]
 classOf[(Long, MyClass)]
 classOf[(String, (Long, MyClass))]

 ?


>>>
>


Re: Kryo registration for Tuples?

2016-06-08 Thread Alexander Pivovarov
Can I just enable spark.kryo.registrationRequired and look at error
messages to get unregistered classes?

On Wed, Jun 8, 2016 at 5:52 PM, Reynold Xin  wrote:

> Due to type erasure they have no difference, although watch out for Scala
> tuple serialization.
>
>
> On Wednesday, June 8, 2016, Ted Yu  wrote:
>
>> I think the second group (3 classOf's) should be used.
>>
>> Cheers
>>
>> On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov > > wrote:
>>
>>> if my RDD is RDD[(String, (Long, MyClass))]
>>>
>>> Do I need to register
>>>
>>> classOf[MyClass]
>>> classOf[(Any, Any)]
>>>
>>> or
>>>
>>> classOf[MyClass]
>>> classOf[(Long, MyClass)]
>>> classOf[(String, (Long, MyClass))]
>>>
>>> ?
>>>
>>>
>>


Re: Kryo registration for Tuples?

2016-06-08 Thread Reynold Xin
Due to type erasure they have no difference, although watch out for Scala
tuple serialization.

On Wednesday, June 8, 2016, Ted Yu  wrote:

> I think the second group (3 classOf's) should be used.
>
> Cheers
>
> On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov  > wrote:
>
>> if my RDD is RDD[(String, (Long, MyClass))]
>>
>> Do I need to register
>>
>> classOf[MyClass]
>> classOf[(Any, Any)]
>>
>> or
>>
>> classOf[MyClass]
>> classOf[(Long, MyClass)]
>> classOf[(String, (Long, MyClass))]
>>
>> ?
>>
>>
>


Re: Kryo registration for Tuples?

2016-06-08 Thread Ted Yu
I think the second group (3 classOf's) should be used.

Cheers

On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov 
wrote:

> if my RDD is RDD[(String, (Long, MyClass))]
>
> Do I need to register
>
> classOf[MyClass]
> classOf[(Any, Any)]
>
> or
>
> classOf[MyClass]
> classOf[(Long, MyClass)]
> classOf[(String, (Long, MyClass))]
>
> ?
>
>


Kryo registration for Tuples?

2016-06-08 Thread Alexander Pivovarov
if my RDD is RDD[(String, (Long, MyClass))]

Do I need to register

classOf[MyClass]
classOf[(Any, Any)]

or

classOf[MyClass]
classOf[(Long, MyClass)]
classOf[(String, (Long, MyClass))]

?


Re: NegativeArraySizeException / segfault

2016-06-08 Thread Andres Perez
We were able to reproduce it with a minimal example. I've opened a jira
issue:

https://issues.apache.org/jira/browse/SPARK-15825

On Wed, Jun 8, 2016 at 12:43 PM, Koert Kuipers  wrote:

> great!
>
> we weren't able to reproduce it because the unit tests use a
> broadcast-join while on the cluster it uses sort-merge-join. so the issue
> is in sort-merge-join.
>
> we are now able to reproduce it in tests using
> spark.sql.autoBroadcastJoinThreshold=-1
> it produces weird looking garbled results in the join.
> hoping to get a minimal reproducible example soon.
>
> On Wed, Jun 8, 2016 at 10:24 AM, Pete Robbins  wrote:
>
>> I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
>> similar looking issue. Analyzing the core dump from the segv with Memory
>> Analyzer it looks very much like a UTF8String is very corrupt.
>>
>> Cheers,
>>
>>
>> On Fri, 27 May 2016 at 21:00 Koert Kuipers  wrote:
>>
>>> hello all,
>>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>>> trying to run some algorithms at scale on our cluster.
>>> unfortunately this means that when i see errors i am having a harder
>>> time boiling it down to a small reproducible example.
>>>
>>> today we are running an iterative algo using the dataset api and we are
>>> seeing tasks fail with errors which seem to related to unsafe operations.
>>> the same tasks succeed without issues in our unit tests.
>>>
>>> i see either:
>>>
>>> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in
>>> stage 21.0 (TID 1073)
>>> java.lang.NegativeArraySizeException
>>> at
>>> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>>> at
>>> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>>> Source)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>>> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>>> Source)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>>> Source)
>>> at
>>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> at
>>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
>>> at
>>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
>>> at
>>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> or alternatively:
>>>
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
>>> #
>>> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
>>> 1.7.0_75-b13)
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
>>> linux-amd64 compressed oops)
>>> # Problematic frame:
>>> # v  ~StubRoutines::jbyte_disjoint_arraycopy

Re: NegativeArraySizeException / segfault

2016-06-08 Thread Koert Kuipers
great!

we weren't able to reproduce it because the unit tests use a broadcast-join
while on the cluster it uses sort-merge-join. so the issue is in
sort-merge-join.

we are now able to reproduce it in tests using
spark.sql.autoBroadcastJoinThreshold=-1
it produces weird looking garbled results in the join.
hoping to get a minimal reproducible example soon.

On Wed, Jun 8, 2016 at 10:24 AM, Pete Robbins  wrote:

> I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
> similar looking issue. Analyzing the core dump from the segv with Memory
> Analyzer it looks very much like a UTF8String is very corrupt.
>
> Cheers,
>
>
> On Fri, 27 May 2016 at 21:00 Koert Kuipers  wrote:
>
>> hello all,
>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>> trying to run some algorithms at scale on our cluster.
>> unfortunately this means that when i see errors i am having a harder time
>> boiling it down to a small reproducible example.
>>
>> today we are running an iterative algo using the dataset api and we are
>> seeing tasks fail with errors which seem to related to unsafe operations.
>> the same tasks succeed without issues in our unit tests.
>>
>> i see either:
>>
>> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in
>> stage 21.0 (TID 1073)
>> java.lang.NegativeArraySizeException
>> at
>> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>> at
>> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>> Source)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>> Source)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>> Source)
>> at
>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>> at
>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
>> at
>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
>> at
>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> or alternatively:
>>
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
>> #
>> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
>> 1.7.0_75-b13)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # v  ~StubRoutines::jbyte_disjoint_arraycopy
>>
>> i assume the best thing would be to try to get it to print out the
>> generated code that is causing this?
>> what switch do i need to use again to do so?
>> thanks,
>> koert
>>
>


Re: NegativeArraySizeException / segfault

2016-06-08 Thread Pete Robbins
I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
similar looking issue. Analyzing the core dump from the segv with Memory
Analyzer it looks very much like a UTF8String is very corrupt.

Cheers,

On Fri, 27 May 2016 at 21:00 Koert Kuipers  wrote:

> hello all,
> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
> trying to run some algorithms at scale on our cluster.
> unfortunately this means that when i see errors i am having a harder time
> boiling it down to a small reproducible example.
>
> today we are running an iterative algo using the dataset api and we are
> seeing tasks fail with errors which seem to related to unsafe operations.
> the same tasks succeed without issues in our unit tests.
>
> i see either:
>
> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in stage
> 21.0 (TID 1073)
> java.lang.NegativeArraySizeException
> at
> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
> at
> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
> at
> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
> at
> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> or alternatively:
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
> 1.7.0_75-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
>
> i assume the best thing would be to try to get it to print out the
> generated code that is causing this?
> what switch do i need to use again to do so?
> thanks,
> koert
>


Spark 2.0.0 preview docs uploaded

2016-06-08 Thread Sean Owen
OK, this is done:

http://spark.apache.org/documentation.html
http://spark.apache.org/docs/2.0.0-preview/
http://spark.apache.org/docs/preview/

On Tue, Jun 7, 2016 at 4:59 PM, Shivaram Venkataraman
 wrote:
> As far as I know the process is just to copy docs/_site from the build
> to the appropriate location in the SVN repo (i.e.
> site/docs/2.0.0-preview).
>
> Thanks
> Shivaram
>
> On Tue, Jun 7, 2016 at 8:14 AM, Sean Owen  wrote:
>> As a stop-gap, I can edit that page to have a small section about
>> preview releases and point to the nightly docs.
>>
>> Not sure who has the power to push 2.0.0-preview to site/docs, but, if
>> that's done then we can symlink "preview" in that dir to it and be
>> done, and update this section about preview docs accordingly.
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org