ReceiverTrackerSuite failing in master build

2015-07-28 Thread Ted Yu
Hi, I noticed that ReceiverTrackerSuite is failing in master Jenkins build for both hadoop profiles. The failure seems to start with: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3104/ FYI

Re: Generalised Spark-HBase integration

2015-07-28 Thread Ted Yu
I got a compilation error: [INFO] /home/hbase/s-on-hbase/src/main/scala:-1: info: compiling [INFO] Compiling 18 source files to /home/hbase/s-on-hbase/target/classes at 1438099569598 [ERROR] /home/hbase/s-on-hbase/src/main/scala/org/apache/spark/hbase/examples/simple/HBaseTableSimple.scala:36:

Re: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-03 Thread Ted Yu
When I tried to compile against hbase 1.1.1, I got: [ERROR] /home/hbase/ssoh/src/main/scala/org/apache/spark/sql/hbase/SparkSqlRegionObserver.scala:124: overloaded method next needs result type [ERROR] override def next(result: java.util.List[Cell], limit: Int) = next(result) Is there plan to

Re: 答复: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-11 Thread Ted Yu
, …, etc., which allows for loosely-coupled query engines built on top of it. Thanks, 发件人: Ted Yu [mailto:yuzhih...@gmail.com] 发送时间: 2015年8月11日 8:54 收件人: Bing Xiao (Bing) 抄送: dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc 主题: Re: Package Release Annoucement: Spark SQL

Re: subscribe

2015-08-13 Thread Ted Yu
See first section on https://spark.apache.org/community On Thu, Aug 13, 2015 at 9:44 AM, Naga Vij nvbuc...@gmail.com wrote: subscribe

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
Thanks Josh for the initiative. I think reducing the redundancy in QA bot posts would make discussion on GitHub UI more focused. Cheers On Thu, Aug 13, 2015 at 7:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 On Wed,

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
I tried accessing just now. It took several seconds before the page showed up. FYI On Thu, Aug 13, 2015 at 7:56 PM, Cheng, Hao hao.ch...@intel.com wrote: I found the https://spark-prs.appspot.com/ is super slow while open it in a new window recently, not sure just myself or everybody

Re: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-10 Thread Ted Yu
Yan / Bing: Mind taking a look at HBASE-14181 https://issues.apache.org/jira/browse/HBASE-14181 'Add Spark DataFrame DataSource to HBase-Spark Module' ? Thanks On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) bing.x...@huawei.com wrote: We are happy to announce the availability of the Spark

Re: Is there any way to support multiple users executing SQL on thrift server?

2015-08-06 Thread Ted Yu
What is the JIRA number if a JIRA has been logged for this ? Thanks On Jan 20, 2015, at 11:30 AM, Cheng Lian lian.cs@gmail.com wrote: Hey Yi, I'm quite unfamiliar with Hadoop/HDFS auth mechanisms for now, but would like to investigate this issue later. Would you please open an

Re: Asked to remove non-existent executor exception

2015-07-26 Thread Ted Yu
SparkDeploySchedulerBackend: Asked to remove non-existent executor 2... 15/07/23 13:26:41 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2... -- 原始邮件 -- *发件人:* Ted Yu;yuzhih...@gmail.com; *发送时间:* 2015年7月26日(星期天) 晚上10:51 *收件人:* Pa

Re: problems with build of latest the master

2015-07-14 Thread Ted Yu
Looking at Jenkins, master branch compiles. Can you try the following command ? mvn -Phive -Phadoop-2.6 -DskipTests clean package What version of Java are you using ? Cheers On Tue, Jul 14, 2015 at 2:23 AM, Gil Vernik g...@il.ibm.com wrote: I just did checkout of the master and tried to

Re: Apache gives exception when running groupby on df temp table

2015-07-16 Thread Ted Yu
Can you provide a bit more information such as: release of Spark you use snippet of your SparkSQL query Thanks On Thu, Jul 16, 2015 at 5:31 AM, nipun ibnipu...@gmail.com wrote: I have a dataframe. I register it as a temp table and run a spark sql query on it to get another dataframe. Now

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Ted Yu
+1 to removing commit messages. On Jul 18, 2015, at 1:35 AM, Sean Owen so...@cloudera.com wrote: +1 to removing them. Sometimes there are 50+ commits because people have been merging from master into their branch rather than rebasing. On Sat, Jul 18, 2015 at 8:48 AM, Reynold Xin

Re: Expression.resolved unmatched with the correct values in catalyst?

2015-07-18 Thread Ted Yu
What if you move your addition to before line 64 (in master branch there is case for if e.checkInputDataTypes().isFailure): case c: Cast if !c.resolved = Cheers On Wed, Jul 15, 2015 at 12:47 AM, Takeshi Yamamuro linguin@gmail.com wrote: Hi, devs I found that the case of

Re: If gmail, check sparm

2015-07-18 Thread Ted Yu
Interesting read. I did find a lot of Spark mails in Spam folder. Thanks Mridul On Jul 18, 2015, at 10:25 AM, Mridul Muralidharan mri...@gmail.com wrote: https://plus.google.com/+LinusTorvalds/posts/DiG9qANf5PA I have noticed a bunch of mails from dev@ and github going to spam -

KinesisStreamSuite failing in master branch

2015-07-19 Thread Ted Yu
Hi, I noticed that KinesisStreamSuite fails for both hadoop profiles in master Jenkins builds. From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console : KinesisStreamSuite:*** RUN ABORTED *** java.lang.AssertionError:

Re: KinesisStreamSuite failing in master branch

2015-07-20 Thread Ted Yu
...@gmail.com wrote: Yep, I emailed TD about it; I think that we may need to make a change to the pull request builder to fix this. Pending that, we could just revert the commit that added this. On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, I noticed

Re: ./dev/run-tests fail on master

2015-07-12 Thread Ted Yu
When I ran dev/run-tests , I got : File ./dev/run-tests.py, line 68, in __main__.identify_changed_files_from_git_commits Failed example: 'root' in [x.name for x in determine_modules_for_files( identify_changed_files_from_git_commits(50a0496a43, target_ref=6765ef9))] Exception raised:

Re: problems with build of latest the master

2015-07-15 Thread Ted Yu
I attached a patch for HADOOP-12235 BTW openstack was not mentioned in the first email from Gil. My email and Gil's second email were sent around the same moment. Cheers On Wed, Jul 15, 2015 at 2:06 AM, Steve Loughran ste...@hortonworks.com wrote: On 14 Jul 2015, at 12:22, Ted Yu yuzhih

Re: Trouble creating JIRA issue

2015-10-22 Thread Ted Yu
You can use the following link: https://issues.apache.org/jira/secure/CreateIssue!default.jspa Remember to select Spark as the project. On Thu, Oct 22, 2015 at 9:38 AM, Richard Marscher wrote: > Hi, > > I'm working on following the guidelines for contributing code to

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
nal >> aggregate functions not supposed to be used or I am using them in the wrong >> way or is it a bug as I asked in my first mail. >> >> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Have you tried using avg in place o

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
unsodh...@gmail.com > > wrote: > >> Also are the other aggregate functions to be treated as bugs or not? >> >> On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani <sshagunsodh...@gmail.com >> > wrote: >> >>> Wouldnt it be: >>> >>>

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
Why did you directly jump to spark-streaming-mqtt module ? Can you drop 'spark-streaming-mqtt' and try again ? Not sure why 1.5.0-SNAPSHOT showed up. Were you using RC2 source ? Cheers On Sun, Nov 8, 2015 at 7:28 PM, 欧锐 <494165...@qq.com> wrote: > > build spark-streaming-mqtt_2.10 failed! > >

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
+1 On Sat, Nov 7, 2015 at 4:35 PM, Denny Lee wrote: > +1 > > > On Sat, Nov 7, 2015 at 12:01 PM Mark Hamstra > wrote: > >> +1 >> >> On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin wrote: >> >>> Please vote on releasing the

Re: Seems jenkins is down (or very slow)?

2015-11-12 Thread Ted Yu
I was able to access the following where response was fast: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45806/ Cheers On Thu, Nov 12, 2015 at 6:21 PM, Yin Huai wrote: > Hi

Re: SparkPullRequestBuilder coverage

2015-11-13 Thread Ted Yu
; It only runs tests that are impacted by the change. E.g. if you only > modify SQL, it won't run the core or streaming tests. > > > On Fri, Nov 13, 2015 at 11:17 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Hi, >> I noticed that SparkPullRequestBuilder complet

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map

Re: releasing Spark 1.4.2

2015-11-16 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtLKc2ctNPcq=Re+Spark+1+4+2+release+and+votes+conversation+ > On Nov 15, 2015, at 10:53 PM, Niranda Perera wrote: > > Hi, > > I am wondering when spark 1.4.2 will be released? > > is it in the voting stage at the

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Ted Yu
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic >

Re: test failed due to OOME

2015-11-02 Thread Ted Yu
Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins builds. I wonder if this is due to difference between machines running QA tests vs machines running Jenkins builds. On Fri, Oct 30, 2015 at 1:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I noticed that the Spa

Re: test failed due to OOME

2015-10-30 Thread Ted Yu
per-job basis (this doesn't > > scale that well). > > > > thoughts? > > > > On Fri, Oct 30, 2015 at 9:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> This happened recently on Jenkins: > >> > >> > https://amplab.cs.berkeley.edu/j

Re: unscribe

2015-11-01 Thread Ted Yu
Please take a look at first section of spark.apache.org/community FYI On Sun, Nov 1, 2015 at 1:09 AM, Chenxi Li wrote: > unscribe >

Re: Master build fails ?

2015-11-05 Thread Ted Yu
; > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Ted Yu <yuzhih...@gmail.com> > To:Dilip Biswal/Oakland/IBM@IBMUS > Cc:Jean-Baptiste Onofré <j...@nanthrax.net>, "dev@spark.apache.org&

Re: Master build fails ?

2015-11-05 Thread Ted Yu
ng able to find > com.google.common.hash.HashCodes. > > Is there a solution to this ? > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Jean-Baptiste Onofré <j...@nanthrax.net> > To:Ted Yu <yuzhih...@gmail.com> > Cc:"de

Re: Build fails due to...multiple overloaded alternatives of constructor RDDInfo define default arguments?

2015-11-07 Thread Ted Yu
Created a PR for the compilation error: https://github.com/apache/spark/pull/9538 Cheers On Sat, Nov 7, 2015 at 4:41 AM, Jacek Laskowski wrote: > Hi, > > Checked out the latest sources and the build failed: > > [error] >

Re: Calling stop on StreamingContext locks up

2015-11-07 Thread Ted Yu
Would the following change work for you ? diff --git a/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala b/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala index 61b5a4c..c330d25 100644 ---

Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion: http://search-hadoop.com/m/q3RTtPnPnzwOhBr FYI On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch wrote: > Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build > environment by updating the pom.xml in each of the subprojects. If you

Re: Master build fails ?

2015-11-06 Thread Ted Yu
Since maven is the preferred build vehicle, ivy style dependencies policy would produce surprising results compared to today's behavior. I would suggest staying with current dependencies policy. My two cents. On Fri, Nov 6, 2015 at 6:25 AM, Koert Kuipers wrote: > if there

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-10-31 Thread Ted Yu
On Linux, I got the following test failure (with or without suggested change): testChildProcLauncher(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 0.036 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
Interesting https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/ shows green builds. On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș wrote: > Since Oct. 4 the build fails on 2.11 with the dreaded > > [error]

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 17:49 min FYI On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Interesting > > > https://amplab.cs.be

Re: Building Spark

2015-10-15 Thread Ted Yu
bq. Access is denied Please check permission of the path mentioned. On Thu, Oct 15, 2015 at 3:45 PM, Annabel Melongo < melongo_anna...@yahoo.com.invalid> wrote: > I was trying to build a cloned version of Spark on my local machine using > the command: > mvn -Pyarn -Phadoop-2.4

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Ted Yu
for `SPARK_MASTER_IP`, amazingly, does not show it > being used in any place directly by Spark > <https://github.com/apache/spark/search?utf8=%E2%9C%93=SPARK_MASTER_IP>. > > Clearly, Spark is using this environment variable (otherwise I wouldn't > see the behavior described in my

Re: Problem building Spark

2015-10-19 Thread Ted Yu
See this thread http://search-hadoop.com/m/q3RTtV3VFNdgNri2=Re+Build+spark+1+5+1+branch+fails > On Oct 19, 2015, at 6:59 PM, Annabel Melongo > wrote: > > I tried to build Spark according to the build directions and the it failed > due to the following error:

test failed due to OOME

2015-10-18 Thread Ted Yu
From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console : SparkListenerSuite:- basic creation and shutdown of LiveListenerBus- bus.stop() waits for the event queue to completely drain- basic creation of StageInfo- basic

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-14 Thread Ted Yu
Some old bits: http://stackoverflow.com/questions/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac http://stackoverflow.com/questions/29412157/passing-hostname-to-netty FYI On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I’m setting the Spark

Re: taking the heap dump when an executor goes OOM

2015-10-12 Thread Ted Yu
http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss > On Oct 11, 2015, at 10:45 PM, Niranda Perera wrote: > > Hi all, > > is there a way for me to get the heap-dump hprof of an executor jvm, when it > goes out

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
You can go to: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN and see if the test failure(s) you encountered appeared there. FYI On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu wrote: > Hi Spark Devs, > > I recently encountered several cases

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
in _get_connection > IndexError: pop from an empty deque > > > > On Mon, Oct 12, 2015 at 1:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > You can go to: > > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN > > > > and

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
; On October 12, 2015 at 2:45:13 PM, Ted Yu (yuzhih...@gmail.com) wrote: > > Can you re-submit your PR to trigger a new build - assuming the tests are > flaky ? > > If any test fails again, consider contacting the owner of the module for > expert opinion. > > Cheers &

Re: Getting started

2015-10-13 Thread Ted Yu
Please see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Oct 13, 2015 at 5:49 AM, _abhishek wrote: > Hello > I am interested in contributing to apache spark.I am new to open source.Can > someone please help me with how to get

Re: Compiling Spark with a local hadoop profile

2015-10-08 Thread Ted Yu
In root pom.xml : 2.2.0 You can override the version of hadoop with command similar to: -Phadoop-2.4 -Dhadoop.version=2.7.0 Cheers On Thu, Oct 8, 2015 at 11:22 AM, sbiookag wrote: > I'm modifying hdfs module inside hadoop, and would like the see the > reflection while

Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Ted Yu
Looking at https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull : [error] [error] while compiling:

Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Ted Yu
Owen so...@cloudera.com wrote: This is an error from scalac and not Spark. I find it happens frequently for me but goes away on a clean build. *shrug* On Thu, Jul 9, 2015 at 3:45 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven

Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Ted Yu
streaming-flume-assembly/assembly Cheers On Thu, Jul 9, 2015 at 7:58 AM, Ted Yu yuzhih...@gmail.com wrote: From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull : + build/mvn -DzincPort=3439 -DskipTests -Phadoop-2.4

Re: The latest master branch didn't compile with -Phive?

2015-07-10 Thread Ted Yu
/Spark-QA-Compile/ that the Maven compilation is now broken in master. On Thu, Jul 9, 2015 at 8:48 AM, Ted Yu yuzhih...@gmail.com wrote: I guess the compilation issue didn't surface in QA run because sbt was used: [info] Building Spark (w/Hive 0.13.1) using SBT with these arguments

Re: Spark master broken?

2015-07-12 Thread Ted Yu
Jenkins shows green builds. What Java version did you use ? Cheers On Sun, Jul 12, 2015 at 3:49 AM, René Treffer rtref...@gmail.com wrote: Hi *, I'm currently trying to build master but it fails with [error] Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar [error]

Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
patch into my GSoC Jira issue you mentioned and then we can continue at there. Before I do that stuff, I wanted to get Spark dev community's ideas to solve my problem due to you may have faced such kind of problems before. 26 Ağu 2015 17:13 tarihinde Ted Yu yuzhih...@gmail.com yazdı: I found GORA

Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
The connection failure was to zookeeper. Have you verified that localhost:2181 can serve requests ? What version of hbase was Gora built against ? Cheers On Aug 26, 2015, at 1:50 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi, I start an Hbase cluster for my test class. I use that

Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
works without any error. Hbase version is 0.98.8-hadoop2 and I use Spark 1.3.1 Kind Regards, Furkan KAMACI 26 Ağu 2015 12:08 tarihinde Ted Yu yuzhih...@gmail.com yazdı: The connection failure was to zookeeper. Have you verified that localhost:2181 can serve requests ? What version

Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
/HBaseContextSuite.scala --If you want to look at the old stuff before it went into HBase https://github.com/cloudera-labs/SparkOnHBase Let me know if that helps On Wed, Aug 26, 2015 at 5:40 AM, Ted Yu yuzhih...@gmail.com wrote: Can you log the contents of the Configuration you pass from Spark

Re: (send this email to subscribe)

2015-09-13 Thread Ted Yu
See first section of http://spark.apache.org/community.html Cheers > On Sep 13, 2015, at 6:43 PM, 蒋林 wrote: > > Hi,I need subscribe email list,please send me,thank you > > >

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
Is it possible that Canonical_URL occurs more than once in your json ? Can you check your json input ? Thanks On Sat, Sep 12, 2015 at 2:05 AM, Fengdong Yu wrote: > Hi, > > I am using spark1.4.1 data frame, read JSON data, then save it to orc. the > code is very

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
; > Azuryy Yu > Sr. Infrastructure Engineer > > cel: 158-0164-9103 > wetchat: azuryy > > > On Sat, Sep 12, 2015 at 5:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Is it possible that Canonical_URL occurs more than once in your json ? >> >> Can you check

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-11 Thread Ted Yu
ust choose the master branch and 1.5.0, a correct hadoop version >> (default to 2.2.0 though) and there you go :-) >> >> >> On Wed, Sep 9, 2015 at 6:39 PM Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Jerry: >>> I just tried building hbase-spark module

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-04 Thread Ted Yu
hadoop1 package for Scala 2.10 wasn't in RC1 either: http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/ On Sun, Oct 4, 2015 at 5:17 PM, Nicholas Chammas wrote: > I’m looking here: > > https://s3.amazonaws.com/spark-related-packages/ > > I believe

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
I tried to access https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom on Chrome and Firefox (on Mac) I got 404 FYI On Fri, Oct 2, 2015 at 10:49 AM, andy petrella wrote: > Yup folks, > > I've been reported by someone

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
oct. 2015 20:08, Ted Yu <yuzhih...@gmail.com> a écrit : > >> Andy: >> 1.5.1 has been released. >> >> Maybe you can use this: >> >> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_2.10-1.5.1.pom >> >>

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
too (did not get it before). Maybe the servers are > having issues. > > On Fri, Oct 2, 2015 at 11:05 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > I tried to access > > > https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.

Re: failed to run spark sample on windows

2015-09-28 Thread Ted Yu
What version of hadoop are you using ? Is that version consistent with the one which was used to build Spark 1.4.0 ? Cheers On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong wrote: > I tried to run HdfsTest sample on windows spark-1.4.0 > > bin\run-sample

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread Ted Yu
Jerry: I just tried building hbase-spark module with 1.5.0 and I see: ls -l ~/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0 total 21712 -rw-r--r-- 1 tyu staff 196 Sep 9 09:37 _maven.repositories -rw-r--r-- 1 tyu staff 11081542 Sep 9 09:37 spark-core_2.10-1.5.0.jar -rw-r--r--

Re: Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

2015-09-09 Thread Ted Yu
Here is the example from Reynold ( http://search-hadoop.com/m/q3RTtfvs1P1YDK8d) : scala> val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value") data: org.apache.spark.sql.DataFrame = [key: int, value: double] scala>

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Ted Yu
As a workaround, can you set the number of partitions higher in the sc.textFile method ? Cheers On Mon, Oct 5, 2015 at 3:31 PM, Jegan wrote: > Hi All, > > I am facing the below exception when the size of the file being read in a > partition is above 2GB. This is apparently

Re: SparkR installation not working

2015-09-19 Thread Ted Yu
Looks like you didn't specify sparkr profile when building. Cheers On Sat, Sep 19, 2015 at 12:30 PM, Devl Devel wrote: > Hi All, > > I've built spark 1.5.0 with hadoop 2.6 with a fresh download : > > build/mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean

Re: Using scala-2.11 when making changes to spark source

2015-09-20 Thread Ted Yu
Maybe the following can be used for changing Scala version: http://maven.apache.org/archetype/maven-archetype-plugin/ I played with it a little bit but didn't get far. FYI On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch wrote: > > The dev/change-scala-version.sh [2.11]

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
Which Spark release are you building ? For master branch, I get the following: lib_managed/jars/datanucleus-api-jdo-3.2.6.jar lib_managed/jars/datanucleus-core-3.2.10.jar lib_managed/jars/datanucleus-rdbms-3.2.9.jar FYI On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
xml-apis-1.4.01.jar > commons-math-2.2.jar jaxb-impl-2.2.3-1.jar paranamer-2.3.jar > xmlenc-0.52.jar > commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar paranamer-2.6.jar xz-1.0.jar > commons-net-3.1.jar jblas-1.2.4.jar parquet-avro-1.7.0.jar > zookeeper-3.4.5.jar > commons-pool-1.5.

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I cloned Hive 1.2 code base and saw: 10.10.2.0 So the version used by Spark is quite close to what Hive uses. On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I see. > I use maven to build so I observe different contents under lib_managed > dire

Re: passing SparkContext as parameter

2015-09-21 Thread Ted Yu
You can use broadcast variable for passing connection information. Cheers > On Sep 21, 2015, at 4:27 AM, Priya Ch wrote: > > can i use this sparkContext on executors ?? > In my application, i have scenario of reading from db for certain records in > rdd. Hence I

Re: How to modify Hadoop APIs used by Spark?

2015-09-21 Thread Ted Yu
Can you clarify what you want to do: If you modify existing hadoop InputFormat, etc, it would be a matter of rebuilding hadoop and build Spark using the custom built hadoop as dependency. Do you introduce new InputFormat ? Cheers On Mon, Sep 21, 2015 at 1:20 PM, Dogtail Ray

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
I tried to run test suite and encountered the following: http://pastebin.com/DPnwMGrm FYI On Wed, Dec 2, 2015 at 12:39 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > -0 > > If spark-ec2 is still a supported part of the project, then we should > update its version lists as new

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
+1 Ran through test suite (minus docker-integration-tests) which passed. Overall experience was much better compared with some of the prior RC's. [INFO] Spark Project External Kafka ... SUCCESS [ 53.956 s] [INFO] Spark Project Examples . SUCCESS

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Ted Yu
Ran test suite (minus docker-integration-tests) All passed +1 [INFO] Spark Project External ZeroMQ .. SUCCESS [ 13.647 s] [INFO] Spark Project External Kafka ... SUCCESS [ 45.424 s] [INFO] Spark Project Examples . SUCCESS [02:06

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
k 1.5.0, what happened to me > was I was blocked to get the YARN containers by setting > spark.yarn.executor.nodeLabelExpression property. My question, > https://issues.apache.org/jira/browse/SPARK-7173 will fix this? > > > > Thanks > > Allen > > > > > >

Re: Maven build against Hadoop 2.4 times out

2015-12-14 Thread Ted Yu
.6 is pretty close to master, > I am wondering if there is any environment related issue. > > On Sun, Dec 13, 2015 at 3:38 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Thanks for checking, Yin. >> >> Looks like the cause might be in one of the commits for build #4438

Re: Maven build against Hadoop 2.4 times out

2015-12-13 Thread Ted Yu
since 4438 and 4439 were failed > way before the thrift server tests. > > On Fri, Dec 11, 2015 at 10:27 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Hi, >> You may have noticed that maven build against Hadoop 2.4 times out on >> Jenkins. >> >> The last

Maven build against Hadoop 2.4 times out

2015-12-11 Thread Ted Yu
Hi, You may have noticed that maven build against Hadoop 2.4 times out on Jenkins. The last module is spark-hive-thriftserver This seemed to start with build #4440 FYI - To unsubscribe, e-mail:

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
mailto:sai.sai.s...@gmail.com] > *发送时间:* 2015年12月15日 18:07 > *收件人:* 张志强(旺轩) > *抄送:* Ted Yu; dev > *主题:* Re: spark with label nodes in yarn > > > > SPARK-6470 only supports node label expression for executors. > > SPARK-7173 supports node label expression for A

Re: does spark really support label expr like && or || ?

2015-12-16 Thread Ted Yu
Allen: Since you mentioned scheduling, I assume you were talking about node label support in YARN. If that is the case, can you give us some more information: How node labels are setup in YARN cluster How you specified node labels in application Hadoop and Spark releases you are using Cheers >

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Ted Yu
+1 > On Jan 5, 2016, at 10:49 AM, Davies Liu wrote: > > +1 > > On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas > wrote: >> +1 >> >> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python >> 2.6 is ancient history and

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Ted Yu
Running test suite, there was timeout in hive-thriftserver module. This has been fixed by SPARK-11823. So I assume this is test issue. lgtm On Tue, Dec 22, 2015 at 2:28 PM, Benjamin Fradet wrote: > +1 > On 22 Dec 2015 9:54 p.m., "Andrew Or"

Re: Akka with Spark

2015-12-26 Thread Ted Yu
Do you mind sharing your use case ? It may be possible to use a different approach than Akka. Cheers On Sat, Dec 26, 2015 at 10:08 AM, Disha Shrivastava wrote: > Hi, > > I wanted to know how to use Akka framework with Spark starting from > basics. I saw online that Spark

recurring test failures against hadoop-2.4 profile

2015-12-25 Thread Ted Yu
Hi, You may have noticed the following test failures: org.apache.spark.sql.hive.execution.HiveUDFSuite.UDFIntegerToString org.apache.spark.sql.hive.execution.SQLQuerySuite.udf_java_method Tracing backwards, they started failing since this build:

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Ted Yu
I found that SBT build for Scala 2.11 has been failing ( https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull ) I logged SPARK-12527 and sent a PR. FYI On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust

Re: what is the best way to debug spark / mllib?

2015-12-27 Thread Ted Yu
For #1, 9 minutes seem to be normal. Here was duration for recent build on master branch: [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 10:44

Re: Akka with Spark

2015-12-27 Thread Ted Yu
a processes separate from Spark processes, so you can >>>> monitor, debug, and scale them independently. So consider streaming data >>>> from Akka to Spark Streaming or go the other way, from Spark to Akka >>>> Streams. >>>> >>>> dean >>>

Re: [DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Ted Yu
getMissingParentStages(stage) would be called for the stage (being re-submitted) If there is no missing parents, submitMissingTasks() would be called. If there is missing parent(s), the parent would go through the same flow. I don't see issue in this part of the code. Cheers On Thu, Dec 24,

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Ted Yu
In Jerry's example, the first SparkContext, sc, has been stopped. So there would be only one SparkContext running at any given moment. Cheers On Mon, Dec 21, 2015 at 8:23 AM, Chester @work wrote: > Jerry > I thought you should not create more than one SparkContext

IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
Hi, I noticed that there are a lot of checkstyle warnings in the following form: To my knowledge, we use two spaces for each tab. Not sure why all of a sudden we have so many IndentationCheck warnings: grep 'hild have incorrect indentati' trunkCheckstyle.xml | wc 3133 52645 678294 If

Re: IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
>> >> >> format issue I think, go ahead >> >> >> >> >> At 2015-12-30 13:36:05, "Ted Yu" <yuzhih...@gmail.com> wrote: >> Hi, >> I noticed that there are a lot of checkstyle warnings in the following form: >> >>

<    1   2   3   4   >