Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Ted Yu
com> wrote: > Looks like the other packages may also be corrupt. I’m getting the same > error for the Spark 1.6.1 / Hadoop 2.4 package. > > > https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz > > Nick > ​ > > On Wed, Mar 16, 2016 at 8:28 P

Re: Does anyone implement org.apache.spark.serializer.Serializer in their own code?

2016-03-07 Thread Ted Yu
Josh: SerializerInstance and SerializationStream would also become private[spark], right ? Thanks On Mon, Mar 7, 2016 at 6:57 PM, Josh Rosen wrote: > Does anyone implement Spark's serializer interface > (org.apache.spark.serializer.Serializer) in your own third-party

Re: Spark SQL drops the HIVE table in "overwrite" mode while writing into table

2016-03-05 Thread Ted Yu
Please stack trace, code snippet, etc in the JIRA you created so that people can reproduce what you saw. On Sat, Mar 5, 2016 at 7:02 AM, Dhaval Modi wrote: > > Regards, > Dhaval Modi > dhavalmod...@gmail.com > > -- Forwarded message -- > From: Dhaval Modi

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
gt; - bad equals/hashCode > > On Fri, Mar 4, 2016 at 2:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > Last time I checked there wasn't high impact defects. > > > > Mind pointing out the defects you think should be fixed ? > > > > Thanks > > >

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
va code. I'm not suggesting anyone run it regularly, > but one run to catch some bugs is useful. > > I've already triaged ~70 issues there just in the Java code, of which > a handful are important. > > On Fri, Mar 4, 2016 at 12:18 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
Since majority of code is written in Scala which is not analyzed by Coverity, the efficacy of the tool seems limited. > On Mar 4, 2016, at 2:34 AM, Sean Owen wrote: > > https://scan.coverity.com/projects/apache-spark-2f9d080d-401d-47bc-9dd1-7956c411fbb4?tab=overview > >

Re: Spark log4j fully qualified class name

2016-02-27 Thread Ted Yu
Looking at https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html *WARNING* Generating the caller class information is slow. Thus, use should be avoided unless execution speed is not an issue. On Sat, Feb 27, 2016 at 12:40 PM, Prabhu Joseph

Re: Hbase in spark

2016-02-26 Thread Ted Yu
In hbase, there is hbase-spark module which supports bulk load. This module is to be backported in the upcoming 1.3.0 release. There is some pending work, such as HBASE-15271 . FYI On Fri, Feb 26, 2016 at 8:50 AM, Renu Yadav wrote: > Has anybody implemented bulk load into

Re: Opening a JIRA for QuantileDiscretizer bug

2016-02-22 Thread Ted Yu
When you click on Create, you're brought to 'Create Issue' dialog where you choose Project Spark. Component should be MLlib. Please see also: http://search-hadoop.com/m/q3RTtmsshe1W6cH22/spark+pull+template=pull+request+template On Mon, Feb 22, 2016 at 6:45 PM, Pierson, Oliver C

Re: 回复: a new FileFormat 5x~100x faster than parquet

2016-02-22 Thread Ted Yu
The referenced benchmark is in Chinese. Please provide English version so that more people can understand. For item 7, looks like the speed of ingest is much slower compared to using Parquet. Cheers On Mon, Feb 22, 2016 at 6:12 AM, 开心延年 wrote: > 1.ya100 is not only the

Re: Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException

2016-02-11 Thread Ted Yu
Hdfs class is in hadoop-hdfs-XX.jar Can you check the classpath to see if the above jar is there ? Please describe the command lines you used for building hadoop / Spark. Cheers On Thu, Feb 11, 2016 at 5:15 PM, Charlie Wright wrote: > I am having issues trying to run a

Re: Error aliasing an array column.

2016-02-09 Thread Ted Yu
Do you mind pastebin'ning code snippet and exception one more time - I couldn't see them in your original email. Which Spark release are you using ? On Tue, Feb 9, 2016 at 11:55 AM, rakeshchalasani wrote: > Hi All: > > I am getting an "UnsupportedOperationException" when

Re: Error aliasing an array column.

2016-02-09 Thread Ted Yu
ethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.re

Re: Error aliasing an array column.

2016-02-09 Thread Ted Yu
> > ++ > |arrayCol| > ++ > | [0, 1]| > | [1, 2]| > | [2, 3]| > | [3, 4]| > | [4, 5]| > | [5, 6]| > | [6, 7]| > | [7, 8]| > | [8, 9]| > | [9, 10]| > ++ > > > > On Tue, Feb 9, 2016 at 4:52 PM

Re: Welcoming two new committers

2016-02-08 Thread Ted Yu
Congratulations, Herman and Wenchen. On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia wrote: > Hi all, > > The PMC has recently added two new Spark committers -- Herman van Hovell > and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, > adding new

Re: Building Spark with Custom Hadoop Version

2016-02-04 Thread Ted Yu
Assuming your change is based on hadoop-2 branch, you can use 'mvn install' command which would put artifacts under 2.8.0-SNAPSHOT subdir in your local maven repo. Here is an example: ~/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.8.0-SNAPSHOT Then you can use the following command to build

Re: Encrypting jobs submitted by the client

2016-02-02 Thread Ted Yu
For #1, a brief search landed the following: core/src/main/scala/org/apache/spark/SparkConf.scala: DeprecatedConfig("spark.rpc", "2.0", "Not used any more.") core/src/main/scala/org/apache/spark/SparkConf.scala: "spark.rpc.numRetries" -> Seq(

Re: Secure multi tenancy on in stand alone mode

2016-02-01 Thread Ted Yu
w.r.t. running Spark on YARN, there are a few outstanding issues. e.g. SPARK-11182 HDFS Delegation Token See also the comments under SPARK-12279 FYI On Mon, Feb 1, 2016 at 1:02 PM, eugene miretsky wrote: > When having multiple users sharing the same Spark cluster,

Re: Scala 2.11 default build

2016-02-01 Thread Ted Yu
The following jobs have been established for build against Scala 2.10: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-MAVEN-SCALA-2.10/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.10/ FYI On

Re: Scala 2.11 default build

2016-01-30 Thread Ted Yu
Does this mean the following Jenkins builds can be disabled ? https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-MAVEN-SCALA-2.11/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.11/ Cheers On Sat, Jan

Re: Spark not able to fetch events from Amazon Kinesis

2016-01-30 Thread Ted Yu
w.r.t. protobuf-java version mismatch, I wonder if you can rebuild Spark with the following change (using maven): http://pastebin.com/fVQAYWHM Cheers On Sat, Jan 30, 2016 at 12:49 AM, Yash Sharma wrote: > Hi All, > I have a quick question if anyone has experienced this

Re: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
gnoreCase("string"); > > String tsColName = null; > if (iTimestamp >= 0) { > tsColName = > jobConf.get(serdeConstants.LIST_COLUMNS).split(",")[iTimestamp]; > } > > > > -- 原始邮件 ------ > *发件人:* "Jörn Fran

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Ted Yu
After this change: [SPARK-12681] [SQL] split IdentifiersParser.g into two files the biggest file under sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser is SparkSqlParser.g Maybe split SparkSqlParser.g up as well ? On Thu, Jan 28, 2016 at 5:21 AM, Iulian Dragoș

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
For the last two problems, hbase-site.xml seems not to be on classpath. Once hbase-site.xml is put on classpath, you should be able to make progress. Cheers > On Jan 28, 2016, at 1:14 AM, Maciej Bryński wrote: > > Hi, > I'm trying to run SQL query on Hive table which is

Re: timeout in shuffle problem

2016-01-24 Thread Ted Yu
Cycling past bits: http://search-hadoop.com/m/q3RTtU5CRU1KKVA42=RE+shuffle+FetchFailedException+in+spark+on+YARN+job On Sun, Jan 24, 2016 at 5:52 AM, wangzhenhua (G) wrote: > Hi, > > I have a problem of time out in shuffle, it happened after shuffle write > and at the

Re: Spark 1.6.0 and HDP 2.2 - problem

2016-01-13 Thread Ted Yu
I would suggest trying option #1 first. Thanks > On Jan 13, 2016, at 2:12 AM, Maciej Bryński wrote: > > Hi, > I/m trying to run Spark 1.6.0 on HDP 2.2 > Everything was fine until I tried to turn on dynamic allocation. > According to instruction I need to add shuffle service

Re: Dependency on TestingUtils in a Spark package

2016-01-12 Thread Ted Yu
There is no annotation in TestingUtils class indicating whether it is suitable for consumption by external projects. You should assume the class is not public since its methods may change in future Spark releases. Cheers On Tue, Jan 12, 2016 at 12:36 PM, Robert Dodier

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Ted Yu
I logged SPARK-12778 where endian awareness in Platform.java should help in mixed endian set up. There could be other parts of the code base which are related. Cheers On Tue, Jan 12, 2016 at 7:01 AM, Adam Roberts wrote: > Hi all, I've been experimenting with DataFrame

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Ted Yu
+1 > On Jan 5, 2016, at 10:49 AM, Davies Liu wrote: > > +1 > > On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas > wrote: >> +1 >> >> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python >> 2.6 is ancient history and

Re: IndentationCheck of checkstyle

2015-12-30 Thread Ted Yu
Right. Pardon my carelessness. > On Dec 29, 2015, at 9:58 PM, Reynold Xin <r...@databricks.com> wrote: > > OK to close the loop - this thread has nothing to do with Spark? > > >> On Tue, Dec 29, 2015 at 9:55 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>

IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
Hi, I noticed that there are a lot of checkstyle warnings in the following form: To my knowledge, we use two spaces for each tab. Not sure why all of a sudden we have so many IndentationCheck warnings: grep 'hild have incorrect indentati' trunkCheckstyle.xml | wc 3133 52645 678294 If

Re: IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
>> >> >> format issue I think, go ahead >> >> >> >> >> At 2015-12-30 13:36:05, "Ted Yu" <yuzhih...@gmail.com> wrote: >> Hi, >> I noticed that there are a lot of checkstyle warnings in the following form: >> >>

Re: what is the best way to debug spark / mllib?

2015-12-27 Thread Ted Yu
For #1, 9 minutes seem to be normal. Here was duration for recent build on master branch: [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 10:44

Re: Akka with Spark

2015-12-27 Thread Ted Yu
a processes separate from Spark processes, so you can >>>> monitor, debug, and scale them independently. So consider streaming data >>>> from Akka to Spark Streaming or go the other way, from Spark to Akka >>>> Streams. >>>> >>>> dean >>>

Re: Akka with Spark

2015-12-26 Thread Ted Yu
Do you mind sharing your use case ? It may be possible to use a different approach than Akka. Cheers On Sat, Dec 26, 2015 at 10:08 AM, Disha Shrivastava wrote: > Hi, > > I wanted to know how to use Akka framework with Spark starting from > basics. I saw online that Spark

recurring test failures against hadoop-2.4 profile

2015-12-25 Thread Ted Yu
Hi, You may have noticed the following test failures: org.apache.spark.sql.hive.execution.HiveUDFSuite.UDFIntegerToString org.apache.spark.sql.hive.execution.SQLQuerySuite.udf_java_method Tracing backwards, they started failing since this build:

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Ted Yu
I found that SBT build for Scala 2.11 has been failing ( https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull ) I logged SPARK-12527 and sent a PR. FYI On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust

Re: [DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Ted Yu
getMissingParentStages(stage) would be called for the stage (being re-submitted) If there is no missing parents, submitMissingTasks() would be called. If there is missing parent(s), the parent would go through the same flow. I don't see issue in this part of the code. Cheers On Thu, Dec 24,

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Ted Yu
Running test suite, there was timeout in hive-thriftserver module. This has been fixed by SPARK-11823. So I assume this is test issue. lgtm On Tue, Dec 22, 2015 at 2:28 PM, Benjamin Fradet wrote: > +1 > On 22 Dec 2015 9:54 p.m., "Andrew Or"

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Ted Yu
In Jerry's example, the first SparkContext, sc, has been stopped. So there would be only one SparkContext running at any given moment. Cheers On Mon, Dec 21, 2015 at 8:23 AM, Chester @work wrote: > Jerry > I thought you should not create more than one SparkContext

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Ted Yu
Ran test suite (minus docker-integration-tests) All passed +1 [INFO] Spark Project External ZeroMQ .. SUCCESS [ 13.647 s] [INFO] Spark Project External Kafka ... SUCCESS [ 45.424 s] [INFO] Spark Project Examples . SUCCESS [02:06

Re: does spark really support label expr like && or || ?

2015-12-16 Thread Ted Yu
Allen: Since you mentioned scheduling, I assume you were talking about node label support in YARN. If that is the case, can you give us some more information: How node labels are setup in YARN cluster How you specified node labels in application Hadoop and Spark releases you are using Cheers >

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
k 1.5.0, what happened to me > was I was blocked to get the YARN containers by setting > spark.yarn.executor.nodeLabelExpression property. My question, > https://issues.apache.org/jira/browse/SPARK-7173 will fix this? > > > > Thanks > > Allen > > > > > >

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
mailto:sai.sai.s...@gmail.com] > *发送时间:* 2015年12月15日 18:07 > *收件人:* 张志强(旺轩) > *抄送:* Ted Yu; dev > *主题:* Re: spark with label nodes in yarn > > > > SPARK-6470 only supports node label expression for executors. > > SPARK-7173 supports node label expression for A

Re: Maven build against Hadoop 2.4 times out

2015-12-14 Thread Ted Yu
.6 is pretty close to master, > I am wondering if there is any environment related issue. > > On Sun, Dec 13, 2015 at 3:38 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Thanks for checking, Yin. >> >> Looks like the cause might be in one of the commits for build #4438

Re: Maven build against Hadoop 2.4 times out

2015-12-13 Thread Ted Yu
since 4438 and 4439 were failed > way before the thrift server tests. > > On Fri, Dec 11, 2015 at 10:27 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Hi, >> You may have noticed that maven build against Hadoop 2.4 times out on >> Jenkins. >> >> The last

Maven build against Hadoop 2.4 times out

2015-12-11 Thread Ted Yu
Hi, You may have noticed that maven build against Hadoop 2.4 times out on Jenkins. The last module is spark-hive-thriftserver This seemed to start with build #4440 FYI - To unsubscribe, e-mail:

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
I tried to run test suite and encountered the following: http://pastebin.com/DPnwMGrm FYI On Wed, Dec 2, 2015 at 12:39 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > -0 > > If spark-ec2 is still a supported part of the project, then we should > update its version lists as new

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
+1 Ran through test suite (minus docker-integration-tests) which passed. Overall experience was much better compared with some of the prior RC's. [INFO] Spark Project External Kafka ... SUCCESS [ 53.956 s] [INFO] Spark Project Examples . SUCCESS

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-24 Thread Ted Yu
If I am not mistaken, the binaries for Scala 2.11 were generated against hadoop 1. What about binaries for Scala 2.11 against hadoop 2.x ? Cheers On Sun, Nov 22, 2015 at 2:21 PM, Michael Armbrust wrote: > In order to facilitate community testing of Spark 1.6.0, I'm

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Ted Yu
Should a new job be setup under Spark-Master-Maven-with-YARN for hadoop 2.6.x ? Cheers On Thu, Nov 19, 2015 at 5:16 PM, 张志强(旺轩) wrote: > I agreed > +1 > > -- > 发件人:Reynold Xin > 日

Re: releasing Spark 1.4.2

2015-11-16 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtLKc2ctNPcq=Re+Spark+1+4+2+release+and+votes+conversation+ > On Nov 15, 2015, at 10:53 PM, Niranda Perera wrote: > > Hi, > > I am wondering when spark 1.4.2 will be released? > > is it in the voting stage at the

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map

Re: SparkPullRequestBuilder coverage

2015-11-13 Thread Ted Yu
; It only runs tests that are impacted by the change. E.g. if you only > modify SQL, it won't run the core or streaming tests. > > > On Fri, Nov 13, 2015 at 11:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Hi, >> I noticed that SparkPullRequestBuilder complet

Re: Seems jenkins is down (or very slow)?

2015-11-12 Thread Ted Yu
I was able to access the following where response was fast: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45806/ Cheers On Thu, Nov 12, 2015 at 6:21 PM, Yin Huai wrote: > Hi

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Ted Yu
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic >

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
Why did you directly jump to spark-streaming-mqtt module ? Can you drop 'spark-streaming-mqtt' and try again ? Not sure why 1.5.0-SNAPSHOT showed up. Were you using RC2 source ? Cheers On Sun, Nov 8, 2015 at 7:28 PM, 欧锐 <494165...@qq.com> wrote: > > build spark-streaming-mqtt_2.10 failed! > >

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
+1 On Sat, Nov 7, 2015 at 4:35 PM, Denny Lee wrote: > +1 > > > On Sat, Nov 7, 2015 at 12:01 PM Mark Hamstra > wrote: > >> +1 >> >> On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin wrote: >> >>> Please vote on releasing the

Re: Build fails due to...multiple overloaded alternatives of constructor RDDInfo define default arguments?

2015-11-07 Thread Ted Yu
Created a PR for the compilation error: https://github.com/apache/spark/pull/9538 Cheers On Sat, Nov 7, 2015 at 4:41 AM, Jacek Laskowski wrote: > Hi, > > Checked out the latest sources and the build failed: > > [error] >

Re: Calling stop on StreamingContext locks up

2015-11-07 Thread Ted Yu
Would the following change work for you ? diff --git a/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala b/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala index 61b5a4c..c330d25 100644 ---

Re: Master build fails ?

2015-11-06 Thread Ted Yu
Since maven is the preferred build vehicle, ivy style dependencies policy would produce surprising results compared to today's behavior. I would suggest staying with current dependencies policy. My two cents. On Fri, Nov 6, 2015 at 6:25 AM, Koert Kuipers wrote: > if there

Re: Master build fails ?

2015-11-05 Thread Ted Yu
; > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Ted Yu <yuzhih...@gmail.com> > To:Dilip Biswal/Oakland/IBM@IBMUS > Cc:Jean-Baptiste Onofré <j...@nanthrax.net>, "dev@spark.apache.org&

Re: Master build fails ?

2015-11-05 Thread Ted Yu
ng able to find > com.google.common.hash.HashCodes. > > Is there a solution to this ? > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Jean-Baptiste Onofré <j...@nanthrax.net> > To:Ted Yu <yuzhih...@gmail.com> > Cc:"de

Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion: http://search-hadoop.com/m/q3RTtPnPnzwOhBr FYI On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch wrote: > Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build > environment by updating the pom.xml in each of the subprojects. If you

Re: test failed due to OOME

2015-11-02 Thread Ted Yu
Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins builds. I wonder if this is due to difference between machines running QA tests vs machines running Jenkins builds. On Fri, Oct 30, 2015 at 1:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I noticed that the Spa

Re: unscribe

2015-11-01 Thread Ted Yu
Please take a look at first section of spark.apache.org/community FYI On Sun, Nov 1, 2015 at 1:09 AM, Chenxi Li wrote: > unscribe >

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-10-31 Thread Ted Yu
On Linux, I got the following test failure (with or without suggested change): testChildProcLauncher(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 0.036 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at

Re: test failed due to OOME

2015-10-30 Thread Ted Yu
per-job basis (this doesn't > > scale that well). > > > > thoughts? > > > > On Fri, Oct 30, 2015 at 9:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> This happened recently on Jenkins: > >> > >> > https://amplab.cs.berkeley.edu/j

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
nal >> aggregate functions not supposed to be used or I am using them in the wrong >> way or is it a bug as I asked in my first mail. >> >> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Have you tried using avg in place o

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
unsodh...@gmail.com > > wrote: > >> Also are the other aggregate functions to be treated as bugs or not? >> >> On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani <sshagunsodh...@gmail.com >> > wrote: >> >>> Wouldnt it be: >>> >>>

Re: Trouble creating JIRA issue

2015-10-22 Thread Ted Yu
You can use the following link: https://issues.apache.org/jira/secure/CreateIssue!default.jspa Remember to select Spark as the project. On Thu, Oct 22, 2015 at 9:38 AM, Richard Marscher wrote: > Hi, > > I'm working on following the guidelines for contributing code to

Re: Problem building Spark

2015-10-19 Thread Ted Yu
See this thread http://search-hadoop.com/m/q3RTtV3VFNdgNri2=Re+Build+spark+1+5+1+branch+fails > On Oct 19, 2015, at 6:59 PM, Annabel Melongo > wrote: > > I tried to build Spark according to the build directions and the it failed > due to the following error:

test failed due to OOME

2015-10-18 Thread Ted Yu
From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console : SparkListenerSuite:- basic creation and shutdown of LiveListenerBus- bus.stop() waits for the event queue to completely drain- basic creation of StageInfo- basic

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Ted Yu
for `SPARK_MASTER_IP`, amazingly, does not show it > being used in any place directly by Spark > <https://github.com/apache/spark/search?utf8=%E2%9C%93=SPARK_MASTER_IP>. > > Clearly, Spark is using this environment variable (otherwise I wouldn't > see the behavior described in my

Re: Building Spark

2015-10-15 Thread Ted Yu
bq. Access is denied Please check permission of the path mentioned. On Thu, Oct 15, 2015 at 3:45 PM, Annabel Melongo < melongo_anna...@yahoo.com.invalid> wrote: > I was trying to build a cloned version of Spark on my local machine using > the command: > mvn -Pyarn -Phadoop-2.4

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-14 Thread Ted Yu
Some old bits: http://stackoverflow.com/questions/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac http://stackoverflow.com/questions/29412157/passing-hostname-to-netty FYI On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I’m setting the Spark

Re: Getting started

2015-10-13 Thread Ted Yu
Please see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Oct 13, 2015 at 5:49 AM, _abhishek wrote: > Hello > I am interested in contributing to apache spark.I am new to open source.Can > someone please help me with how to get

Re: taking the heap dump when an executor goes OOM

2015-10-12 Thread Ted Yu
http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss > On Oct 11, 2015, at 10:45 PM, Niranda Perera wrote: > > Hi all, > > is there a way for me to get the heap-dump hprof of an executor jvm, when it > goes out

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
You can go to: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN and see if the test failure(s) you encountered appeared there. FYI On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu wrote: > Hi Spark Devs, > > I recently encountered several cases

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
in _get_connection > IndexError: pop from an empty deque > > > > On Mon, Oct 12, 2015 at 1:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > You can go to: > > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN > > > > and

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
; On October 12, 2015 at 2:45:13 PM, Ted Yu (yuzhih...@gmail.com) wrote: > > Can you re-submit your PR to trigger a new build - assuming the tests are > flaky ? > > If any test fails again, consider contacting the owner of the module for > expert opinion. > > Cheers &

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
Interesting https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/ shows green builds. On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș wrote: > Since Oct. 4 the build fails on 2.11 with the dreaded > > [error]

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 17:49 min FYI On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Interesting > > > https://amplab.cs.be

Re: Compiling Spark with a local hadoop profile

2015-10-08 Thread Ted Yu
In root pom.xml : 2.2.0 You can override the version of hadoop with command similar to: -Phadoop-2.4 -Dhadoop.version=2.7.0 Cheers On Thu, Oct 8, 2015 at 11:22 AM, sbiookag wrote: > I'm modifying hdfs module inside hadoop, and would like the see the > reflection while

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Ted Yu
As a workaround, can you set the number of partitions higher in the sc.textFile method ? Cheers On Mon, Oct 5, 2015 at 3:31 PM, Jegan wrote: > Hi All, > > I am facing the below exception when the size of the file being read in a > partition is above 2GB. This is apparently

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-04 Thread Ted Yu
hadoop1 package for Scala 2.10 wasn't in RC1 either: http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/ On Sun, Oct 4, 2015 at 5:17 PM, Nicholas Chammas wrote: > I’m looking here: > > https://s3.amazonaws.com/spark-related-packages/ > > I believe

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
I tried to access https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom on Chrome and Firefox (on Mac) I got 404 FYI On Fri, Oct 2, 2015 at 10:49 AM, andy petrella wrote: > Yup folks, > > I've been reported by someone

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
oct. 2015 20:08, Ted Yu <yuzhih...@gmail.com> a écrit : > >> Andy: >> 1.5.1 has been released. >> >> Maybe you can use this: >> >> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_2.10-1.5.1.pom >> >>

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
too (did not get it before). Maybe the servers are > having issues. > > On Fri, Oct 2, 2015 at 11:05 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > I tried to access > > > https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.

Re: failed to run spark sample on windows

2015-09-28 Thread Ted Yu
What version of hadoop are you using ? Is that version consistent with the one which was used to build Spark 1.4.0 ? Cheers On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong wrote: > I tried to run HdfsTest sample on windows spark-1.4.0 > > bin\run-sample

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
Which Spark release are you building ? For master branch, I get the following: lib_managed/jars/datanucleus-api-jdo-3.2.6.jar lib_managed/jars/datanucleus-core-3.2.10.jar lib_managed/jars/datanucleus-rdbms-3.2.9.jar FYI On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
xml-apis-1.4.01.jar > commons-math-2.2.jar jaxb-impl-2.2.3-1.jar paranamer-2.3.jar > xmlenc-0.52.jar > commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar paranamer-2.6.jar xz-1.0.jar > commons-net-3.1.jar jblas-1.2.4.jar parquet-avro-1.7.0.jar > zookeeper-3.4.5.jar > commons-pool-1.5.

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I cloned Hive 1.2 code base and saw: 10.10.2.0 So the version used by Spark is quite close to what Hive uses. On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I see. > I use maven to build so I observe different contents under lib_managed > dire

Re: passing SparkContext as parameter

2015-09-21 Thread Ted Yu
You can use broadcast variable for passing connection information. Cheers > On Sep 21, 2015, at 4:27 AM, Priya Ch wrote: > > can i use this sparkContext on executors ?? > In my application, i have scenario of reading from db for certain records in > rdd. Hence I

Re: How to modify Hadoop APIs used by Spark?

2015-09-21 Thread Ted Yu
Can you clarify what you want to do: If you modify existing hadoop InputFormat, etc, it would be a matter of rebuilding hadoop and build Spark using the custom built hadoop as dependency. Do you introduce new InputFormat ? Cheers On Mon, Sep 21, 2015 at 1:20 PM, Dogtail Ray

Re: Using scala-2.11 when making changes to spark source

2015-09-20 Thread Ted Yu
Maybe the following can be used for changing Scala version: http://maven.apache.org/archetype/maven-archetype-plugin/ I played with it a little bit but didn't get far. FYI On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch wrote: > > The dev/change-scala-version.sh [2.11]

Re: SparkR installation not working

2015-09-19 Thread Ted Yu
Looks like you didn't specify sparkr profile when building. Cheers On Sat, Sep 19, 2015 at 12:30 PM, Devl Devel wrote: > Hi All, > > I've built spark 1.5.0 with hadoop 2.6 with a fresh download : > > build/mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean

Re: (send this email to subscribe)

2015-09-13 Thread Ted Yu
See first section of http://spark.apache.org/community.html Cheers > On Sep 13, 2015, at 6:43 PM, 蒋林 wrote: > > Hi,I need subscribe email list,please send me,thank you > > >

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
Is it possible that Canonical_URL occurs more than once in your json ? Can you check your json input ? Thanks On Sat, Sep 12, 2015 at 2:05 AM, Fengdong Yu wrote: > Hi, > > I am using spark1.4.1 data frame, read JSON data, then save it to orc. the > code is very

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
; > Azuryy Yu > Sr. Infrastructure Engineer > > cel: 158-0164-9103 > wetchat: azuryy > > > On Sat, Sep 12, 2015 at 5:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Is it possible that Canonical_URL occurs more than once in your json ? >> >> Can you check

<    1   2   3   4   >