Re: Unable to access Resource Manager /Name Node on port 9026 / 9101 on a Spark EMR Cluster

2016-04-15 Thread Wei-Shun Lo
Hi Chanda, You may want to check by using nmap to check whether the port and service is correctly started locally. ex. nmap localhost If the port is already successfully internally, it might be related to the outbound/inbound traffic control in your security group setting. Just fyi. On Fri,

Will not store rdd_16_4383 as it would require dropping another block from the same RDD

2016-04-15 Thread Alexander Pivovarov
I run Spark 1.6.1 on YARN (EMR-4.5.0) I call RDD.count on MEMORY_ONLY_SER cached RDD (spark.serializer is KryoSerializer) after count task is done I noticed that Spark UI shows that RDD Fraction Cached is 6% only Size in Memory = 65.3 GB I looked at Executors stderr on Spark UI and saw lots

Re: Skipping Type Conversion and using InternalRows for UDF

2016-04-15 Thread Michael Armbrust
This would also probably improve performance: https://github.com/apache/spark/pull/9565 On Fri, Apr 15, 2016 at 8:44 AM, Hamel Kothari wrote: > Hi all, > > So we have these UDFs which take <1ms to operate and we're seeing pretty > poor performance around them in

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mridul Muralidharan
On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this statement I think that my primary interest in > this Spark Extras and the good work by Luciano here is that anytime we > take bits out of a code base and “move it to GitHub” I see

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
100% agree with Sean & Reynold's comments on this. Adding this as a TLP would just cause more confusion as to "official" endorsement. On Fri, Apr 15, 2016 at 11:50 AM, Sean Owen wrote: > On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: >> I

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being set. Creating this project at the ASF creates a synergy between *Apache

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Hey Reynold, Thanks. Getting to the heart of this, I think that this project would be successful if the Apache Spark PMC decided to participate and there was some overlap. As much as I think it would be great to stand up another project, the goal here from Luciano and crew (myself included) would

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Jean-Baptiste Onofré
+1 Regards JB On 04/15/2016 06:41 PM, Mattmann, Chris A (3980) wrote: Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: > I know the name might be confusing, but I also think that the projects have > a very big synergy, more like sibling projects, where "Spark Extras" extends > the Spark community and develop/maintain components for, and

ClassFormatError in latest spark 2 SNAPSHOT build

2016-04-15 Thread Koert Kuipers
not sure why, but i am getting this today using spark 2 snapshots... i am on java 7 and scala 2.11 16/04/15 12:35:46 WARN TaskSetManager: Lost task 2.0 in stage 3.0 (TID 15, localhost): java.lang.ClassFormatError: Duplicate field name in class file

Re: ClassFormatError in latest spark 2 SNAPSHOT build

2016-04-15 Thread Reynold Xin
Can you post the generated code? df.queryExecution.debug.codeGen() (Or something similar to that) On Friday, April 15, 2016, Koert Kuipers wrote: > not sure why, but i am getting this today using spark 2 snapshots... > i am on java 7 and scala 2.11 > > 16/04/15 12:35:46

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger wrote: > Given that not all of the connectors were removed, I think this > creates a weird / confusing three tier system > > 1. connectors in the official project's spark/extras or spark/external > 2. connectors in "Spark

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
I am curious if all Spark unit tests pass with the forced true value for unaligned. If that is the case, it seems we can add s390x to the known architectures. It would also give us some more background if you can describe how java.nio.Bits#unaligned() is implemented on s390x. Josh / Andrew /

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:18 AM, Sean Owen wrote: > Why would this need to be an ASF project of its own? I don't think > it's possible to have a yet another separate "Spark Extras" TLP (?) > > There is already a project to manage these bits of code on Github. How > about all

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
Given that not all of the connectors were removed, I think this creates a weird / confusing three tier system 1. connectors in the official project's spark/extras or spark/external 2. connectors in "Spark Extras" 3. connectors in some random organization's github On Fri, Apr 15, 2016 at 11:18

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Chris Fregly
and how does this all relate to the existing 1-and-a-half-class citizen known as spark-packages.org? support for this citizen is buried deep in the Spark source (which was always a bit odd, in my opinion): https://github.com/apache/spark/search?utf8=%E2%9C%93=spark-packages On Fri, Apr 15,

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
Why would this need to be an ASF project of its own? I don't think it's possible to have a yet another separate "Spark Extras" TLP (?) There is already a project to manage these bits of code on Github. How about all of the interested parties manage the code there, under the same process, under

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
After some collaboration with other community members, we have created a initial draft for Spark Extras which is available for review at https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing We would like to invite other community members to participate

Skipping Type Conversion and using InternalRows for UDF

2016-04-15 Thread Hamel Kothari
Hi all, So we have these UDFs which take <1ms to operate and we're seeing pretty poor performance around them in practice, the overhead being >10ms for the projections (this data is deeply nested with ArrayTypes and MapTypes so that could be the cause). Looking at the logs and code for ScalaUDF,

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
Can you clarify whether BytesToBytesMapOffHeapSuite passed or failed with the forced true value for unaligned ? If the test failed, please pastebin the failure(s). Thanks On Fri, Apr 15, 2016 at 8:32 AM, Adam Roberts wrote: > Ted, yep I'm working from the latest code

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
I assume you tested 2.0 with SPARK-12181 . Related code from Platform.java if java.nio.Bits#unaligned() throws exception: // We at least know x86 and x64 support unaligned access. String arch = System.getProperty("os.arch", ""); //noinspection

Unable to access Resource Manager /Name Node on port 9026 / 9101 on a Spark EMR Cluster

2016-04-15 Thread Chadha Pooja
Hi , We have setup a Spark Cluster (3 node) on Amazon EMR. We aren't able to use port 9026 and 9101 on the existing Spark EMR Cluster which are part of the Web UIs offered with Amazon EMR. I was able to use other ports like Zeppelin port, 8890, HUE etc We checked that the security settings

BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
Hi, I'm testing Spark 2.0.0 on various architectures and have a question, are we sure if core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java really is attempting to use unaligned memory access (for the BytesToBytesMapOffHeapSuite tests specifically)? Our JDKs on

Re: Should localProperties be inheritable? Should we change that or document it?

2016-04-15 Thread Marcin Tustin
It would be a pleasure. That said, what do you think about adding the non-inheritable feature? I think that would be a big win for everything that doesn't specifically need Inheritability. On Friday, April 15, 2016, Reynold Xin wrote: > I think this was added a long time

Re: Should localProperties be inheritable? Should we change that or document it?

2016-04-15 Thread Reynold Xin
I think this was added a long time ago by me in order to make certain things work for Shark (good old times ...). You are probably right that by now some apps depend on the fact that this is inheritable, and changing that could break them in weird ways. Do you mind documenting this, and also add