[POWERED BY] Please add our organization

2015-09-23 Thread Oleg Shirokikh
Name: Frontline Systems Inc.
URL: www.solver.com

Description:
*  We built an interface between Microsoft Excel and Apache Spark - bringing 
Big Data from the clusters to Excel enabling tools ranging from simple charts 
and Power View dashboards to add-ins for machine learning and predictive 
analytics, Monte Carlo simulation and risk analysis, and linear and nonlinear 
optimization. Using Spark Core API and Spark SQL to draw representative samples 
and summarize large datasets, it's now possible to extract "actionable small 
data" from a Big Data cluster, and make it usable by business analysts who lack 
programming expertise.
*  See our blog 
post
 presenting the analysis of 20+ years of airline flight data using Excel only



RE: Spark SQL: STDDEV working in Spark Shell but not in a standalone app

2015-05-11 Thread Oleg Shirokikh
Michael – Thanks for the response – that’s right, I haven’t noticed that Spark 
Shell instantiates sqlContext as a HiveContext, not actual Spark SQL Context… 
I’ve seen the PR to add STDDEV to data frames.. Can I expect this to be added 
to Spark SQL in Spark 1.4 or it’s still uncertain? It would be really helpful 
to know in order to understand if I have to change existing code to use 
HiveContext instead of SQLContext (which would be undesired)… Thanks!

From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Saturday, May 09, 2015 11:32 AM
To: Oleg Shirokikh
Cc: user
Subject: Re: Spark SQL: STDDEV working in Spark Shell but not in a standalone 
app

Are you perhaps using a HiveContext in the shell but a SQLContext in your app?  
I don't think we natively implement stddev until 1.4.0

On Fri, May 8, 2015 at 4:44 PM, barmaley 
o...@solver.commailto:o...@solver.com wrote:
Given a registered table from data frame, I'm able to execute queries like
sqlContext.sql(SELECT STDDEV(col1) FROM table) from Spark Shell just fine.
However, when I run exactly the same code in a standalone app on a cluster,
it throws an exception: java.util.NoSuchElementException: key not found:
STDDEV...

Is STDDEV ia among default functions in Spark SQL? I'd appreciate if you
could comment what's going on with the above.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-STDDEV-working-in-Spark-Shell-but-not-in-a-standalone-app-tp22825.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



RE: FW: Submitting jobs to Spark EC2 cluster remotely

2015-02-23 Thread Oleg Shirokikh
(CoarseGrainedExecutorBackend.scala:163)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: 
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
... 7 more
/***/


When I go into worker UI from Master page, I can see the RUNNING Executor - 
it's in LOADING state. Here is its stderr:

/***/
15/02/23 18:15:05 INFO executor.CoarseGrainedExecutorBackend: Registered signal 
handlers for [TERM, HUP, INT]
15/02/23 18:15:06 INFO spark.SecurityManager: Changing view acls to: root,oleg
15/02/23 18:15:06 INFO spark.SecurityManager: Changing modify acls to: root,oleg
15/02/23 18:15:06 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root, oleg); users 
with modify permissions: Set(root, oleg)
15/02/23 18:15:06 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/23 18:15:06 INFO Remoting: Starting remoting
15/02/23 18:15:06 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://driverpropsfetc...@ip-172-31-33-195.us-west-2.compute.internal:34609]
15/02/23 18:15:06 INFO util.Utils: Successfully started service 
'driverPropsFetcher' on port 34609.
/***/


So it seems that there is a problem with starting executors...


Hopefully this clarifies the environment and workflow. I'd be happy to provide 
any additional information.

Again, thanks a lot for help and time looking into this. Although I know the 
perfectly legit way how to work with Spark EC2 cluster (run the driver within 
the cluster), it's extremely interesting to understand how remoting works with 
Spark. And in general it would be very useful to have the ability to submit 
jobs remotely.

Thanks,
Oleg


-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Monday, February 23, 2015 1:22 AM
To: Oleg Shirokikh
Cc: user@spark.apache.org
Subject: Re: FW: Submitting jobs to Spark EC2 cluster remotely

What happens if you submit from the master node itself on ec2 (in client mode), 
does that work? What about in cluster mode?

It would be helpful if you could print the full command that the executor is 
failing. That might show that spark.driver.host is being set strangely. IIRC we 
print the launch command before starting the executor.

Overall the standalone cluster mode is not as well tested across environments 
with asymmetric connectivity. I didn't actually realize that akka (which the 
submission uses) can handle this scenario. But it does seem like the job is 
submitted, it's just not starting correctly.

- Patrick

On Mon, Feb 23, 2015 at 1:13 AM, Oleg Shirokikh o...@solver.com wrote:
 Patrick,

 I haven't changed the configs much. I just executed ec2-script to create 1 
 master, 2 slaves cluster. Then I try to submit the jobs from remote machine 
 leaving all defaults configured by Spark scripts as default. I've tried to 
 change configs as suggested in other mailing-list and stack overflow threads 
 (such as setting spark.driver.host, etc...), removed (hopefully) all 
 security/firewall restrictions from AWS, etc. but it didn't help.

 I think that what you are saying is exactly the issue: on my master node UI 
 at the bottom I can see the list of Completed Drivers all with ERROR 
 state...

 Thanks,
 Oleg

 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Monday, February 23, 2015 12:59 AM
 To: Oleg Shirokikh
 Cc: user@spark.apache.org
 Subject: Re: Submitting jobs to Spark EC2 cluster remotely

 Can you list other configs that you are setting? It looks like the executor 
 can't communicate back to the driver. I'm actually not sure it's a good idea 
 to set spark.driver.host here, you want to let spark set that automatically.

 - Patrick

 On Mon, Feb 23, 2015 at 12:48 AM, Oleg Shirokikh o...@solver.com wrote:
 Dear

FW: Submitting jobs to Spark EC2 cluster remotely

2015-02-23 Thread Oleg Shirokikh
Patrick,

I haven't changed the configs much. I just executed ec2-script to create 1 
master, 2 slaves cluster. Then I try to submit the jobs from remote machine 
leaving all defaults configured by Spark scripts as default. I've tried to 
change configs as suggested in other mailing-list and stack overflow threads 
(such as setting spark.driver.host, etc...), removed (hopefully) all 
security/firewall restrictions from AWS, etc. but it didn't help.

I think that what you are saying is exactly the issue: on my master node UI at 
the bottom I can see the list of Completed Drivers all with ERROR state...

Thanks,
Oleg

-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Monday, February 23, 2015 12:59 AM
To: Oleg Shirokikh
Cc: user@spark.apache.org
Subject: Re: Submitting jobs to Spark EC2 cluster remotely

Can you list other configs that you are setting? It looks like the executor 
can't communicate back to the driver. I'm actually not sure it's a good idea to 
set spark.driver.host here, you want to let spark set that automatically.

- Patrick

On Mon, Feb 23, 2015 at 12:48 AM, Oleg Shirokikh o...@solver.com wrote:
 Dear Patrick,

 Thanks a lot for your quick response. Indeed, following your advice I've 
 uploaded the jar onto S3 and FileNotFoundException is gone now and job is 
 submitted in cluster deploy mode.

 However, now both (client and cluster) fail with the following errors in 
 executors (they keep exiting/killing executors as I see in UI):

 15/02/23 08:42:46 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:oleg 
 cause:java.util.concurrent.TimeoutException: Futures timed out after 
 [30 seconds]


 Full log is:

 15/02/23 01:59:11 INFO executor.CoarseGrainedExecutorBackend: 
 Registered signal handlers for [TERM, HUP, INT]
 15/02/23 01:59:12 INFO spark.SecurityManager: Changing view acls to: 
 root,oleg
 15/02/23 01:59:12 INFO spark.SecurityManager: Changing modify acls to: 
 root,oleg
 15/02/23 01:59:12 INFO spark.SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view 
 permissions: Set(root, oleg); users with modify permissions: Set(root, 
 oleg)
 15/02/23 01:59:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/02/23 01:59:12 INFO Remoting: Starting remoting
 15/02/23 01:59:13 INFO Remoting: Remoting started; listening on 
 addresses 
 :[akka.tcp://driverpropsfetc...@ip-172-31-33-194.us-west-2.compute.int
 ernal:39379]
 15/02/23 01:59:13 INFO util.Utils: Successfully started service 
 'driverPropsFetcher' on port 39379.
 15/02/23 01:59:43 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:oleg 
 cause:java.util.concurrent.TimeoutException: Futures timed out after [30 
 seconds] Exception in thread main 
 java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
 at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
 at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)
 at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
 at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrai
 nedExecutorBackend.scala) Caused by: 
 java.security.PrivilegedActionException: 
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 ... 4 more
 Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
 seconds]
 at 
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:107)
 at 
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
 ... 7 more




 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Monday, February 23, 2015 12:17 AM
 To: Oleg Shirokikh
 Subject: Re: Submitting jobs to Spark EC2 cluster remotely

 The reason is that the file needs to be in a globally visible 
 filesystem where the master node can download. So it needs to be on 
 s3, for instance, rather than on your local

RE: Creating Apache Spark-powered “As Service” applications

2015-01-16 Thread Oleg Shirokikh
Thanks a lot, Robert – I’ll definitely investigate this and probably would come 
back with questions.

P.S. I’m new to this Spark forum. I’m getting responses through emails but they 
are not appearing as “replies” in the thread – it’s kind of inconvenient. Is it 
something that I should tweak?

Thanks,
Oleg

From: Robert C Senkbeil [mailto:rcsen...@us.ibm.com]
Sent: Friday, January 16, 2015 12:21 PM
To: Oleg Shirokikh
Cc: user@spark.apache.org
Subject: Re: Creating Apache Spark-powered “As Service” applications


Hi,

You can take a look at the Spark Kernel project: 
https://github.com/ibm-et/spark-kernel

The Spark Kernel's goal is to serve as the foundation for interactive 
applications. The project provides a client library in Scala that abstracts 
connecting to the kernel (containing a Spark Context), which can be embedded 
into a web application. We demonstrated this at StataConf when we embedded the 
Spark Kernel client into a Play application to provide an interactive web 
application that communicates to Spark via the Spark Kernel (hosting a Spark 
Context).

A getting started section can be found here: 
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

If you have any other questions, feel free to email me or communicate over our 
mailing list:

spark-ker...@googlegroups.commailto:spark-ker...@googlegroups.com

https://groups.google.com/forum/#!forum/spark-kernel

Signed,
Chip Senkbeil
IBM Emerging Technology Software Engineer

[Inactive hide details for olegshirokikh ---01/16/2015 01:32:43 PM---The 
question is about the ways to create a Windows desktop-]olegshirokikh 
---01/16/2015 01:32:43 PM---The question is about the ways to create a Windows 
desktop-based and/or web-based application client

From: olegshirokikh o...@solver.commailto:o...@solver.com
To: user@spark.apache.orgmailto:user@spark.apache.org
Date: 01/16/2015 01:32 PM
Subject: Creating Apache Spark-powered “As Service” applications





The question is about the ways to create a Windows desktop-based and/or
web-based application client that is able to connect and talk to the server
containing Spark application (either local or on-premise cloud
distributions) in the run-time.

Any language/architecture may work. So far, I've seen two things that may be
a help in that, but I'm not so sure if they would be the best alternative
and how they work yet:

Spark Job Server - https://github.com/spark-jobserver/spark-jobserver -
defines a REST API for Spark
Hue -
http://gethue.com/get-started-with-spark-deploy-spark-server-and-compute-pi-from-your-web-browser/
- uses item 1)

Any advice would be appreciated. Simple toy example program (or steps) that
shows, e.g. how to build such client for simply creating Spark Context on a
local machine and say reading text file and returning basic stats would be
ideal answer!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-Apache-Spark-powered-As-Service-applications-tp21193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org