RE: Announcing Spark 1.0.0
Hi all In https://spark.apache.org/downloads.html, the URL for release note of 1.0.0 seems to be wrong. The URL should be https://spark.apache.org/releases/spark-release-1-0-0.html but links to https://spark.apache.org/releases/spark-release-1.0.0.html Best Regards, Kousuke From: prabeesh k [mailto:prabsma...@gmail.com] Sent: Friday, May 30, 2014 8:18 PM To: user@spark.apache.org Subject: Re: Announcing Spark 1.0.0 I forgot to hard refresh. thanks On Fri, May 30, 2014 at 4:18 PM, Patrick Wendell pwend...@gmail.com wrote: It is updated - try holding Shift + refresh in your browser, you are probably caching the page. On Fri, May 30, 2014 at 3:46 AM, prabeesh k prabsma...@gmail.com wrote: Please update the http://spark.apache.org/docs/latest/ link On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote: Is it possible to download pre build package? http://mirror.symnds.com/software/Apache/incubator/spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz - gives me 404 Best regards, Margus (Margusja) Roo +372 51 48 780 tel:%2B372%2051%2048%20780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) On 30/05/14 13:18, Christopher Nguyen wrote: Awesome work, Pat et al.! -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com linkedin.com/in/ctnguyen http://linkedin.com/in/ctnguyen On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com mailto:pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyone involved in this release - it was truly a community effort with fixes, features, and optimizations contributed from dozens of organizations. This release expands Spark's standard libraries, introducing a new SQL package (SparkSQL) which lets users integrate SQL queries into existing Spark workflows. MLlib, Spark's machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark's core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements. Finally, Spark adds support for Java 8 lambda syntax and improves coverage of the Java and Python API's. Those features only scratch the surface - check out the release notes here: http://spark.apache.org/releases/spark-release-1-0-0.html Note that since release artifacts were posted recently, certain mirrors may not have working downloads for a few hours. - Patrick
Re: JMXSink for YARN deployment
Hi Vladimir How about use --files option with spark-submit? - Kousuke (2014/09/11 23:43), Vladimir Tretyakov wrote: Hi again, yeah , I've tried to use ” spark.metrics.conf” before my question in ML, had no luck:( Any other ideas from somebody? Seems nobody use metrics in YARN deployment mode. How about Mesos? I didn't try but maybe Spark has the same difficulties on Mesos? PS: Spark is great thing in general, will be nice to see metrics in YARN/Mesos mode, not only in Standalone:) On Thu, Sep 11, 2014 at 5:25 PM, Shao, Saisai saisai.s...@intel.com mailto:saisai.s...@intel.com wrote: I think you can try to use ”spark.metrics.conf” to manually specify the path of metrics.properties, but the prerequisite is that each container should find this file in their local FS because this file is loaded locally. Besides I think this might be a kind of workaround, a better solution is to fix this by some other solutions. Thanks Jerry *From:*Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com mailto:vladimir.tretya...@sematext.com] *Sent:* Thursday, September 11, 2014 10:08 PM *Cc:* user@spark.apache.org mailto:user@spark.apache.org *Subject:* Re: JMXSink for YARN deployment Hi Shao, thx for explanation, any ideas how to fix it? Where should I put metrics.properties file? On Thu, Sep 11, 2014 at 4:18 PM, Shao, Saisai saisai.s...@intel.com mailto:saisai.s...@intel.com wrote: Hi, I’m guessing the problem is that driver or executor cannot get the metrics.properties configuration file in the yarn container, so metrics system cannot load the right sinks. Thanks Jerry *From:*Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com mailto:vladimir.tretya...@sematext.com] *Sent:* Thursday, September 11, 2014 7:30 PM *To:* user@spark.apache.org mailto:user@spark.apache.org *Subject:* JMXSink for YARN deployment Hello, we are in Sematext (https://apps.sematext.com/) are writing Monitoring tool for Spark and we came across one question: How to enable JMX metrics for YARN deployment? We put *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink to file $SPARK_HOME/conf/metrics.properties but it doesn't work. Everything works in Standalone mode, but not in YARN mode. Can somebody help? Thx! PS: I've found also https://stackoverflow.com/questions/23529404/spark-on-yarn-how-to-send-metrics-to-graphite-sink/25786112 without answer.
Re: NoClassDefFoundError encountered in Spark 1.2-snapshot build with hive-0.13.1 profile
Hi Terry I think the issue you mentioned will be resolved by following PR. https://github.com/apache/spark/pull/3072 - Kousuke (2014/11/03 10:42), Terry Siu wrote: I just built the 1.2 snapshot current as of commit 76386e1a23c using: $ ./make-distribution.sh —tgz —name my-spark —skip-java-test -DskipTests -Phadoop-2.4 -Phive -Phive-0.13.1 -Pyarn I drop in my Hive configuration files into the conf directory, launch spark-shell, and then create my HiveContext, hc. I then issue a “use db” command: scala hc.hql(“use db”) and receive the following class-not-found error: java.lang.NoClassDefFoundError: com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy at org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1224) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:315) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:286) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:424) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:424) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:111) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:115) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:38) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:40) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:42) at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:44) at $iwC$$iwC$$iwC$$iwC.init(console:46) at $iwC$$iwC$$iwC.init(console:48) at $iwC$$iwC.init(console:50) at $iwC.init(console:52) at init(console:54) at .init(console:58) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIva:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125 at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:8 at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILola:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scal at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scal at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoadla:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIva:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:353) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at
Re: Slave Node Management in Standalone Cluster
Hi Kenichi 1. How can I stop a slave on the specific node? Under `sbin/` directory, there are `start-{all,master,slave,slaves}` and `stop-{all,master,slaves}`, but no `stop-slave`. Are there any way to stop the specific (e.g., the 2nd) slave via command line? You can use sbin/spark-daemon.sh on the machine where the worker you'd like to stop runs. First, you find PID of the worker you'd like to stop and second, you find PID file of the worker. The PID file is on /tmp/ by default and the file name is like as follows. xxx.org.apache.spark.deploy.worker.Worker-WorkerID.pid After you find the PID file, you run the following command. sbin/spark-daemon.sh stop org.apache.spark.worker.Worker WorkerID 2. How can I check cluster status from command line? Are there any way to confirm that all Master / Workers are up and working without using Web UI? AFAIK, there are no command line tools for checking statuses of standalone cluster. Instead of that, you can use special URL like as follows. http://master or worker's hostname:webui-port/json You can get Master and Worker status as JSON format data. - Kousuke (2014/11/18 0:27), Kenichi Maehashi wrote: Hi, I'm operating Spark in standalone cluster configuration (3 slaves) and have some question. 1. How can I stop a slave on the specific node? Under `sbin/` directory, there are `start-{all,master,slave,slaves}` and `stop-{all,master,slaves}`, but no `stop-slave`. Are there any way to stop the specific (e.g., the 2nd) slave via command line? 2. How can I check cluster status from command line? Are there any way to confirm that all Master / Workers are up and working without using Web UI? Thanks in advance! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Please add our meetup home page in Japan.
Hi folks. We have lots of Spark enthusiasts and some organizations held talk events in Tokyo, Japan. Now we're going to unifiy those events and have created our home page in meetup.com. http://www.meetup.com/Tokyo-Spark-Meetup/ Could you add this to the list? Thanks. - Kousuke Saruta - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark 1.6.1 binary pre-built for Hadoop 2.6 may be broken
Hi all, I noticed the binary pre-build for Hadoop 2.6 which we can download from spark.apache.org/downloads.html (Direct Download) may be broken. I couldn't decompress at least following 4 tgzs with "tar xfzv" command and md5-checksum did't match. * spark-1.6.1-bin-hadoop2.6.tgz * spark-1.6.1-bin-hadoop2.4.tgz * spark-1.6.1-bin-hadoop2.3.tgz * spark-1.6.1-bin-cdh4.tgz Following 3 tgzs were decompressed successfully. * spark-1.6.1-bin-hadoop1.tgz * spark-1.6.1-bin-without-hadoop.tgz * spark-1.6.1.tgz was decompressed successfully. Regards, Kousuke - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org