Use of UnsafeRow

2015-09-01 Thread Ulanov, Alexander
Dear Spark developers, Could you suggest what is the intended use of UnsafeRow (except for Tungsten groupBy and sort) and give an example how to use it? 1)Is it intended to be instantiated as the copy of the Row in order to perform in-place modifications of it? 2)Can I create a new UnsafeRow

Re: Tungsten off heap memory access for C++ libraries

2015-09-01 Thread Paul Weiss
https://issues.apache.org/jira/browse/SPARK-10399 Is the jira to track. On Sep 1, 2015 5:32 PM, "Paul Wais" wrote: > Paul: I've worked on running C++ code on Spark at scale before (via JNA, > ~200 > cores) and am working on something more contribution-oriented now (via >

[VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-01 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version 1.5.0. The vote is open until Friday, Sep 4, 2015 at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.5.0 [ ] -1 Do not release this package because ...

Re: Tungsten off heap memory access for C++ libraries

2015-09-01 Thread Paul Wais
Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200 cores) and am working on something more contribution-oriented now (via JNI). A few comments: * If you need something *today*, try JNA. It can be slow (e.g. a short native function in a tight loop) but works if you

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread Sean Owen
That's correct for the 1.5 branch, right? this doesn't mean that the next RC would have this value. You choose the release version during the release process. On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen wrote: > Seems that Github branch-1.5 already changing the version to

Re: Tungsten off heap memory access for C++ libraries

2015-09-01 Thread Reynold Xin
Please do. Thanks. On Mon, Aug 31, 2015 at 5:00 AM, Paul Weiss wrote: > Sounds good, want me to create a jira and link it to SPARK-9697? Will put > down some ideas to start. > On Aug 31, 2015 4:14 AM, "Reynold Xin" wrote: > >> BTW if you are

[ compress in-memory column storage used in sparksql cache table ]

2015-09-01 Thread Wangchangchun (A)
Hi, I have an idea, can someone give me some advice? I want to compress data in in-memory column storage which is used by cache table in spark. This will make cache table use less memory. I will set an conf to this function, so if anyone want to use this function, he can set this conf to

taking an n number of rows from and RDD starting from an index

2015-09-01 Thread Niranda Perera
Hi all, I have a large set of data which would not fit into the memory. So, I wan to take n number of data from the RDD given a particular index. for an example, take 1000 rows starting from the index 1001. I see that there is a take(num: Int): Array[T] method in the RDD, but it only returns

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread chester
Sorry, I am still not follow. I assume the release would build from 1.5.0 before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from 1.5.1 snapshot during release ? Or 1.5.0 rc3 would build from the last commit of 1.5.0 (before changing to 1.5.1 snapshot) ? Sent from my iPad > On

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread Sean Owen
The head of branch 1.5 will always be a "1.5.x-SNAPSHOT" version. Yeah technically you would expect it to be 1.5.0-SNAPSHOT until 1.5.0 is released. In practice I think it's simpler to follow the defaults of the Maven release plugin, which will set this to 1.5.1-SNAPSHOT after any 1.5.0-rc is

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread chester
Thanks for the explanation. Since 1.5.0 rc3 is not yet released, I assume it would cut from 1.5 branch, doesn't that bring 1.5.1 snapshot code ? The reason I am asking these questions is that I would like to know If I want build 1.5.0 myself, which commit should I use ? Sent from my iPad >

[SparkR] lint script for SpakrR

2015-09-01 Thread Yu Ishikawa
Hi all, Shivaram and I added a lint script for SparkR which is `dev/lint-r`. And it's been already running on Jenkins. If there are any validation problems in your patch, Jenkins will fail. Could you please make sure that your patch don't have any validation problems on your local machine before

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread Sean Owen
Any 1.5 RC comes from the latest state of the 1.5 branch at some point in time. The next RC will be cut from whatever the latest commit is. You can see the tags in git for the specific commits for each RC. There's no such thing as "1.5.1 SNAPSHOT" commits, just commits to branch 1.5. I would

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-09-01 Thread Chester Chen
Thanks Sean, that make it clear. On Tue, Sep 1, 2015 at 7:17 AM, Sean Owen wrote: > Any 1.5 RC comes from the latest state of the 1.5 branch at some point > in time. The next RC will be cut from whatever the latest commit is. > You can see the tags in git for the specific