RE: Announcing Spark 1.0.0

2014-05-30 Thread Kousuke Saruta
Hi all

 

In https://spark.apache.org/downloads.html, the URL for release note of 1.0.0 
seems to be wrong.

The URL should be https://spark.apache.org/releases/spark-release-1-0-0.html 
but links to https://spark.apache.org/releases/spark-release-1.0.0.html

 

Best Regards,

Kousuke

 

From: prabeesh k [mailto:prabsma...@gmail.com] 
Sent: Friday, May 30, 2014 8:18 PM
To: user@spark.apache.org
Subject: Re: Announcing Spark 1.0.0

 

I forgot to hard refresh.

thanks

 

 

On Fri, May 30, 2014 at 4:18 PM, Patrick Wendell pwend...@gmail.com wrote:

It is updated - try holding Shift + refresh in your browser, you are
probably caching the page.


On Fri, May 30, 2014 at 3:46 AM, prabeesh k prabsma...@gmail.com wrote:
 Please update the http://spark.apache.org/docs/latest/  link


 On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote:

 Is it possible to download pre build package?

 http://mirror.symnds.com/software/Apache/incubator/spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz
 - gives me 404

 Best regards, Margus (Margusja) Roo
 +372 51 48 780 tel:%2B372%2051%2048%20780 
 http://margus.roo.ee
 http://ee.linkedin.com/in/margusroo
 skype: margusja
 ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)


 On 30/05/14 13:18, Christopher Nguyen wrote:

 Awesome work, Pat et al.!

 --
 Christopher T. Nguyen
 Co-founder  CEO, Adatao http://adatao.com
 linkedin.com/in/ctnguyen http://linkedin.com/in/ctnguyen




 On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com
 mailto:pwend...@gmail.com wrote:

 I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
 is a milestone release as the first in the 1.0 line of releases,
 providing API stability for Spark's core interfaces.

 Spark 1.0.0 is Spark's largest release ever, with contributions from
 117 developers. I'd like to thank everyone involved in this release -
 it was truly a community effort with fixes, features, and
 optimizations contributed from dozens of organizations.

 This release expands Spark's standard libraries, introducing a new
 SQL
 package (SparkSQL) which lets users integrate SQL queries into
 existing Spark workflows. MLlib, Spark's machine learning library, is
 expanded with sparse vector support and several new algorithms. The
 GraphX and Streaming libraries also introduce new features and
 optimizations. Spark's core engine adds support for secured YARN
 clusters, a unified tool for submitting Spark applications, and
 several performance and stability improvements. Finally, Spark adds
 support for Java 8 lambda syntax and improves coverage of the Java
 and
 Python API's.

 Those features only scratch the surface - check out the release
 notes here:
 http://spark.apache.org/releases/spark-release-1-0-0.html

 Note that since release artifacts were posted recently, certain
 mirrors may not have working downloads for a few hours.

 - Patrick





 



Re: JMXSink for YARN deployment

2014-09-11 Thread Kousuke Saruta

Hi Vladimir

How about use --files option with spark-submit?

- Kousuke

(2014/09/11 23:43), Vladimir Tretyakov wrote:
Hi again, yeah , I've tried to use ” spark.metrics.conf” before my 
question in ML, had no  luck:(

Any other ideas from somebody?
Seems nobody use metrics in YARN deployment mode.
How about Mesos? I didn't try but maybe Spark has the same 
difficulties on Mesos?


PS: Spark is great thing in general, will be nice to see metrics in 
YARN/Mesos mode, not only in Standalone:)



On Thu, Sep 11, 2014 at 5:25 PM, Shao, Saisai saisai.s...@intel.com 
mailto:saisai.s...@intel.com wrote:


I think you can try to use ”spark.metrics.conf” to manually
specify the path of metrics.properties, but the prerequisite is
that each container should find this file in their local FS
because this file is loaded locally.

Besides I think this might be a kind of workaround, a better
solution is to fix this by some other solutions.

Thanks

Jerry

*From:*Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com
mailto:vladimir.tretya...@sematext.com]
*Sent:* Thursday, September 11, 2014 10:08 PM
*Cc:* user@spark.apache.org mailto:user@spark.apache.org
*Subject:* Re: JMXSink for YARN deployment

Hi Shao, thx for explanation, any ideas how to fix it? Where
should I put metrics.properties file?

On Thu, Sep 11, 2014 at 4:18 PM, Shao, Saisai
saisai.s...@intel.com mailto:saisai.s...@intel.com wrote:

Hi,

I’m guessing the problem is that driver or executor cannot get the
metrics.properties configuration file in the yarn container, so
metrics system cannot load the right sinks.

Thanks

Jerry

*From:*Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com
mailto:vladimir.tretya...@sematext.com]
*Sent:* Thursday, September 11, 2014 7:30 PM
*To:* user@spark.apache.org mailto:user@spark.apache.org
*Subject:* JMXSink for YARN deployment

Hello, we are in Sematext (https://apps.sematext.com/) are writing
Monitoring tool for Spark and we came across one question:

How to enable JMX metrics for YARN deployment?

We put *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink

to file $SPARK_HOME/conf/metrics.properties but it doesn't work.

Everything works in Standalone mode, but not in YARN mode.

Can somebody help?

Thx!

PS: I've found also

https://stackoverflow.com/questions/23529404/spark-on-yarn-how-to-send-metrics-to-graphite-sink/25786112
without answer.






Re: NoClassDefFoundError encountered in Spark 1.2-snapshot build with hive-0.13.1 profile

2014-11-03 Thread Kousuke Saruta

Hi Terry

I think the issue you mentioned will be resolved by following PR.
https://github.com/apache/spark/pull/3072

- Kousuke

(2014/11/03 10:42), Terry Siu wrote:

I just built the 1.2 snapshot current as of commit 76386e1a23c using:

$ ./make-distribution.sh —tgz —name my-spark —skip-java-test 
-DskipTests -Phadoop-2.4 -Phive -Phive-0.13.1 -Pyarn


I drop in my Hive configuration files into the conf directory, launch 
spark-shell, and then create my HiveContext, hc. I then issue a “use 
db” command:


scala hc.hql(“use db”)

and receive the following class-not-found error:

java.lang.NoClassDefFoundError: 
com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy


at 
org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)


at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1224)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)

at 
org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:315)


at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:286)


at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)


at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)


at 
org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)


at 
org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)


at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:424)


at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:424)


at 
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)


at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103)

at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:111)

at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:115)

at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:31)


at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:36)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:38)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:40)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:42)

at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:44)

at $iwC$$iwC$$iwC$$iwC.init(console:46)

at $iwC$$iwC$$iwC.init(console:48)

at $iwC$$iwC.init(console:50)

at $iwC.init(console:52)

at init(console:54)

at .init(console:58)

at .clinit(console)

at .init(console:7)

at .clinit(console)

at $print(console)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIva:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)


at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125


at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)


at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)

at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)


at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:8


at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)

at 
org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)


at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)

at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)

at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILola:968)


at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scal


at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scal


at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoadla:135)


at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIva:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:353)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at 

Re: Slave Node Management in Standalone Cluster

2014-11-18 Thread Kousuke Saruta

Hi Kenichi


1. How can I stop a slave on the specific node?
   Under `sbin/` directory, there are
`start-{all,master,slave,slaves}` and `stop-{all,master,slaves}`, but
no `stop-slave`. Are there any way to stop the specific (e.g., the
2nd) slave via command line?


You can use sbin/spark-daemon.sh on the machine where the worker you'd 
like to stop runs.
First, you find PID of the worker you'd like to stop and second, you 
find PID file of the worker.

The PID file is on /tmp/ by default and the file name is like as follows.

xxx.org.apache.spark.deploy.worker.Worker-WorkerID.pid

After you find the PID file, you run the following command.

sbin/spark-daemon.sh stop org.apache.spark.worker.Worker WorkerID


2. How can I check cluster status from command line?
   Are there any way to confirm that all Master / Workers are up and
working without using Web UI?


AFAIK, there are no command line tools for checking statuses of 
standalone cluster.

Instead of that, you can use special URL like as follows.

http://master or worker's hostname:webui-port/json

You can get Master and Worker status as JSON format data.

- Kousuke

(2014/11/18 0:27), Kenichi Maehashi wrote:

Hi,

I'm operating Spark in standalone cluster configuration (3 slaves) and
have some question.

1. How can I stop a slave on the specific node?
Under `sbin/` directory, there are
`start-{all,master,slave,slaves}` and `stop-{all,master,slaves}`, but
no `stop-slave`. Are there any way to stop the specific (e.g., the
2nd) slave via command line?

2. How can I check cluster status from command line?
Are there any way to confirm that all Master / Workers are up and
working without using Web UI?

Thanks in advance!




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Please add our meetup home page in Japan.

2015-07-16 Thread Kousuke Saruta
Hi folks.

We have lots of Spark enthusiasts and some organizations held talk
events in Tokyo, Japan.
Now we're going to unifiy those events and have created our home page in
meetup.com.

http://www.meetup.com/Tokyo-Spark-Meetup/

Could you add this to the list?
Thanks.

- Kousuke Saruta

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.6.1 binary pre-built for Hadoop 2.6 may be broken

2016-04-04 Thread Kousuke Saruta
Hi all,

I noticed the binary pre-build for Hadoop 2.6 which we can download from
spark.apache.org/downloads.html (Direct Download) may be broken.
I couldn't decompress at least following 4 tgzs with "tar xfzv" command
and md5-checksum did't match.

* spark-1.6.1-bin-hadoop2.6.tgz
* spark-1.6.1-bin-hadoop2.4.tgz
* spark-1.6.1-bin-hadoop2.3.tgz
* spark-1.6.1-bin-cdh4.tgz

Following 3 tgzs were decompressed successfully.

* spark-1.6.1-bin-hadoop1.tgz
* spark-1.6.1-bin-without-hadoop.tgz
* spark-1.6.1.tgz was decompressed successfully.

Regards,
Kousuke


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org