RE: spark slave cannot execute without admin permission on windows

2015-02-19 Thread Judy Nash
+ dev mailing list

If this is supposed to work, is there a regression then?

The spark core code shows the permission for copied file to \work is set to a+x 
at Line 442 of 
Utils.scalahttps://github.com/apache/spark/blob/b271c265b742fa6947522eda4592e9e6a7fd1f3a/core/src/main/scala/org/apache/spark/util/Utils.scala
 .
The example jar I used had all permissions including Read  Execute prior 
spark-submit:
[cid:image001.png@01D04BDA.A74C65E0]
However after copied to worker node’s \work folder, only limited permission 
left on the jar with no execution right.
[cid:image002.png@01D04BDA.A74C65E0]

From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, February 18, 2015 10:40 PM
To: Judy Nash
Cc: u...@spark.apache.org
Subject: Re: spark slave cannot execute without admin permission on windows

You need not require admin permission, but just make sure all those jars has 
execute permission ( read/write access)

Thanks
Best Regards

On Thu, Feb 19, 2015 at 11:30 AM, Judy Nash 
judyn...@exchange.microsoft.commailto:judyn...@exchange.microsoft.com wrote:
Hi,

Is it possible to configure spark to run without admin permission on windows?

My current setup run master  slave successfully with admin permission.
However, if I downgrade permission level from admin to user, SparkPi fails with 
the following exception on the slave node:
Exception in thread main org.apache.spark.SparkException: Job aborted due to s
tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task
0.3 in stage 0.0 (TID 9, 
workernode0.jnashsparkcurr2.d10.internal.cloudapp.nethttp://workernode0.jnashsparkcurr2.d10.internal.cloudapp.net)
: java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi$$anonfun$1

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)

Upon investigation, it appears that sparkPi jar under 
spark_home\worker\appname\*.jar does not have execute permission set, causing 
spark not able to find class.

Advice would be very much appreciated.

Thanks,
Judy




Re: Replacing Jetty with TomCat

2015-02-19 Thread Niranda Perera
Hi Sean,
The issue we have here is that all our products are based on a single
platform and we try to make all our products coherent with our platform as
much as possible. so, having two web services in one instance would not be
a very elegant solution. That is why we were seeking a way to switch it to
Tomcat. But as I understand, it is not readily supported, hence we will
have to accept it as it is.

If we are not using the Spark UIs, is it possible to disable the UIs and
prevent the jetty server from starting, but yet use the core spark
functionality?

Hi Corey,
thank you for your ideas. Our biggest concern here was that it starts a new
webserver inside spark. opening up new ports etc. might be seen as security
threats when it comes to commercial distributions.

cheers



On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen so...@cloudera.com wrote:

 I do not think it makes sense to make the web server configurable.
 Mostly because there's no real problem in running an HTTP service
 internally based on Netty while you run your own HTTP service based on
 something else like Tomcat. What's the problem?

 On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera
 niranda.per...@gmail.com wrote:
  Hi Sean,
  The main issue we have is, running two web servers in a single product.
 we
  think it would not be an elegant solution.
 
  Could you please point me to the main areas where jetty server is tightly
  coupled or extension points where I could plug tomcat instead of jetty?
  If successful I could contribute it to the spark project. :-)
 
  cheers
 
 
 
  On Mon, Feb 16, 2015 at 4:51 PM, Sean Owen so...@cloudera.com wrote:
 
  There's no particular reason you have to remove the embedded Jetty
  server, right? it doesn't prevent you from using it inside another app
  that happens to run in Tomcat. You won't be able to switch it out
  without rewriting a fair bit of code, no, but you don't need to.
 
  On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera
  niranda.per...@gmail.com wrote:
   Hi,
  
   We are thinking of integrating Spark server inside a product. Our
   current
   product uses Tomcat as its webserver.
  
   Is it possible to switch the Jetty webserver in Spark to Tomcat
   off-the-shelf?
  
   Cheers
  
   --
   Niranda
 
 
 
 
  --
  Niranda




-- 
Niranda


Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Michael Armbrust

 P.S: For some reason replacing  import sqlContext.createSchemaRDD with 
 import sqlContext.implicits._ doesn't do the implicit conversations.
 registerTempTable
 gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?


We will write up a whole migration guide before the final release, but I
can quickly explain this one.  We made the implicit conversion
significantly less broad to avoid the chance of confusing conflicts.
However, now you have to call .toDF in order to force RDDs to become
DataFrames.


Re: Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread Michael Armbrust

 1) is SKEWED BY honored ? If so, has anyone run into directories not being
 created ?


It is not.

2) if it is not honored, does it matter ? Hive introduced this feature to
 better handle joins where tables had a skewed distribution on keys joined
 on so that the single mapper handling one of the keys didn't hold up the
 whole process. Could that happen in Spark / Spark SQL?


It could matter for very skewed data, though I have not heard many
complaints.  We could consider adding it in the future if people are having
problems with skewed data.


Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Krishna Sankar
Excellent. Explicit toDF() works.
a) employees.toDF().registerTempTable(Employees) - works
b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile

Adding to my earlier tests:
4.0 SQL from Scala and Python
4.1 result = sqlContext.sql(SELECT * from Employees WHERE State = 'WA') OK
4.2 result = sqlContext.sql(SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK
4.3 result = sqlContext.sql(SELECT ShipCountry, Sum(OrderDetails.UnitPrice
* Qty * Discount) AS ProductSales FROM Orders INNER JOIN OrderDetails ON
Orders.OrderID = OrderDetails.OrderID GROUP BY ShipCountry) OK
4.4 saveAsParquetFile OK
4.5 Read and verify the 4.4 save - sqlContext.parquetFile,
registerTempTable, sql OK

Cheers  thanks Michael
k/



On Thu, Feb 19, 2015 at 12:02 PM, Michael Armbrust mich...@databricks.com
wrote:

 P.S: For some reason replacing  import sqlContext.createSchemaRDD with 
 import sqlContext.implicits._ doesn't do the implicit conversations.
 registerTempTable
 gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?


 We will write up a whole migration guide before the final release, but I
 can quickly explain this one.  We made the implicit conversion
 significantly less broad to avoid the chance of confusing conflicts.
 However, now you have to call .toDF in order to force RDDs to become
 DataFrames.



Re: Replacing Jetty with TomCat

2015-02-19 Thread Ewan Higgs

To add to Sean and Reynold's point:

Please correct me if I'm wrong, but Spark depends on hadoop-common which 
also uses jetty in the HttpServer2 code. So even if you remove jetty 
from Spark by making it an optional dependency, it will be pulled in by 
Hadoop.


So you'll still see that your program that depends on hypothetical 
Spark-Tomcat will still pull in jetty jars.


-Ewan


On 19/02/15 10:23, Sean Owen wrote:

Sure, but you are not using Netty at all. It's invisible to you. It's
not as if you have to set up and maintain a Jetty container. I don't
think your single platform for your apps is relevant.

You can turn off the UI, but as Reynold said, the HTTP servers are
also part of the core data transport functionality and you can't turn
that off. It's not merely unsupported to swap this out with an
arbitrary container, it's not clear it would work with Tomcat without
re-integrating with its behavior and tuning. But it also shouldn't
matter to anyone.

On Thu, Feb 19, 2015 at 8:11 AM, Niranda Perera
niranda.per...@gmail.com wrote:

Hi Sean,
The issue we have here is that all our products are based on a single
platform and we try to make all our products coherent with our platform as
much as possible. so, having two web services in one instance would not be a
very elegant solution. That is why we were seeking a way to switch it to
Tomcat. But as I understand, it is not readily supported, hence we will have
to accept it as it is.

If we are not using the Spark UIs, is it possible to disable the UIs and
prevent the jetty server from starting, but yet use the core spark
functionality?

Hi Corey,
thank you for your ideas. Our biggest concern here was that it starts a new
webserver inside spark. opening up new ports etc. might be seen as security
threats when it comes to commercial distributions.

cheers



On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen so...@cloudera.com wrote:

I do not think it makes sense to make the web server configurable.
Mostly because there's no real problem in running an HTTP service
internally based on Netty while you run your own HTTP service based on
something else like Tomcat. What's the problem?

On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera
niranda.per...@gmail.com wrote:

Hi Sean,
The main issue we have is, running two web servers in a single product.
we
think it would not be an elegant solution.

Could you please point me to the main areas where jetty server is
tightly
coupled or extension points where I could plug tomcat instead of jetty?
If successful I could contribute it to the spark project. :-)

cheers



On Mon, Feb 16, 2015 at 4:51 PM, Sean Owen so...@cloudera.com wrote:

There's no particular reason you have to remove the embedded Jetty
server, right? it doesn't prevent you from using it inside another app
that happens to run in Tomcat. You won't be able to switch it out
without rewriting a fair bit of code, no, but you don't need to.

On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera
niranda.per...@gmail.com wrote:

Hi,

We are thinking of integrating Spark server inside a product. Our
current
product uses Tomcat as its webserver.

Is it possible to switch the Jetty webserver in Spark to Tomcat
off-the-shelf?

Cheers

--
Niranda




--
Niranda




--
Niranda

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread The Watcher
I have done some testing of inserting into tables defined in Hive using 1.2
and I can see that the PARTITION clause is honored : data files get created
in multiple subdirectories correctly.

I tried the SKEWED BY ON STORED AS DIRECTORIES clause on the CREATE TABLE
clause but I didn't see subdirectories being created in that case.

1) is SKEWED BY honored ? If so, has anyone run into directories not being
created ?

2) if it is not honored, does it matter ? Hive introduced this feature to
better handle joins where tables had a skewed distribution on keys joined
on so that the single mapper handling one of the keys didn't hold up the
whole process. Could that happen in Spark / Spark SQL ?

Thanks


Re: Replacing Jetty with TomCat

2015-02-19 Thread Sean Owen
Sure, but you are not using Netty at all. It's invisible to you. It's
not as if you have to set up and maintain a Jetty container. I don't
think your single platform for your apps is relevant.

You can turn off the UI, but as Reynold said, the HTTP servers are
also part of the core data transport functionality and you can't turn
that off. It's not merely unsupported to swap this out with an
arbitrary container, it's not clear it would work with Tomcat without
re-integrating with its behavior and tuning. But it also shouldn't
matter to anyone.

On Thu, Feb 19, 2015 at 8:11 AM, Niranda Perera
niranda.per...@gmail.com wrote:
 Hi Sean,
 The issue we have here is that all our products are based on a single
 platform and we try to make all our products coherent with our platform as
 much as possible. so, having two web services in one instance would not be a
 very elegant solution. That is why we were seeking a way to switch it to
 Tomcat. But as I understand, it is not readily supported, hence we will have
 to accept it as it is.

 If we are not using the Spark UIs, is it possible to disable the UIs and
 prevent the jetty server from starting, but yet use the core spark
 functionality?

 Hi Corey,
 thank you for your ideas. Our biggest concern here was that it starts a new
 webserver inside spark. opening up new ports etc. might be seen as security
 threats when it comes to commercial distributions.

 cheers



 On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen so...@cloudera.com wrote:

 I do not think it makes sense to make the web server configurable.
 Mostly because there's no real problem in running an HTTP service
 internally based on Netty while you run your own HTTP service based on
 something else like Tomcat. What's the problem?

 On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera
 niranda.per...@gmail.com wrote:
  Hi Sean,
  The main issue we have is, running two web servers in a single product.
  we
  think it would not be an elegant solution.
 
  Could you please point me to the main areas where jetty server is
  tightly
  coupled or extension points where I could plug tomcat instead of jetty?
  If successful I could contribute it to the spark project. :-)
 
  cheers
 
 
 
  On Mon, Feb 16, 2015 at 4:51 PM, Sean Owen so...@cloudera.com wrote:
 
  There's no particular reason you have to remove the embedded Jetty
  server, right? it doesn't prevent you from using it inside another app
  that happens to run in Tomcat. You won't be able to switch it out
  without rewriting a fair bit of code, no, but you don't need to.
 
  On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera
  niranda.per...@gmail.com wrote:
   Hi,
  
   We are thinking of integrating Spark server inside a product. Our
   current
   product uses Tomcat as its webserver.
  
   Is it possible to switch the Jetty webserver in Spark to Tomcat
   off-the-shelf?
  
   Cheers
  
   --
   Niranda
 
 
 
 
  --
  Niranda




 --
 Niranda

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Have Friedman's glmnet algo running in Spark

2015-02-19 Thread mike
Dev List,
A couple of colleagues and I have gotten several versions of glmnet algo coded 
and running on Spark RDD. glmnet algo (http://www.jstatsoft.org/v33/i01/paper) 
is a very fast algorithm for generating coefficient paths solving penalized 
regression with elastic net penalties. The algorithm runs fast by taking an 
approach that generates solutions for a wide variety of penalty parameter. 
We're able to integrate into Mllib class structure a couple of different ways. 
The algorithm may fit better into the new pipeline structure since it naturally 
returns a multitide of models (corresponding to different vales of penalty 
parameters). That appears to fit better into pipeline than Mllib linear 
regression (for example).

We've got regression running with the speed optimizations that Friedman 
recommends. We'll start working on the logistic regression version next.

We're eager to make the code available as open source and would like to get 
some feedback about how best to do that. Any thoughts?
Mike Bowles.




Spark SQL, Hive Parquet data types

2015-02-19 Thread The Watcher
Still trying to get my head around Spark SQL  Hive.

1) Let's assume I *only* use Spark SQL to create and insert data into HIVE
tables, declared in a Hive meta-store.

Does it matter at all if Hive supports the data types I need with Parquet,
or is all that matters what Catalyst  spark's parquet relation support ?

Case in point : timestamps  Parquet
* Parquet now supports them as per
https://github.com/Parquet/parquet-mr/issues/218
* Hive only supports them in 0.14
So would I be able to read/write timestamps natively in Spark 1.2 ? Spark
1.3 ?

I have found this thread
http://apache-spark-user-list.1001560.n3.nabble.com/timestamp-not-implemented-yet-td15414.html
which seems to indicate that the data types supported by Hive would matter
to Spark SQL.
If so, why is that ? Doesn't the read path go through Spark SQL to read the
parquet file ?

2) Is there planned support for Hive 0.14 ?

Thanks


Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Timothy Chen
+1 (non-binding)

Tested Mesos coarse/fine-grained mode with 4 nodes Mesos cluster with
simple shuffle/map task.

Will be testing with more complete suite (ie: spark-perf) once the
infrastructure is setup to do so.

Tim

On Thu, Feb 19, 2015 at 12:50 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Excellent. Explicit toDF() works.
 a) employees.toDF().registerTempTable(Employees) - works
 b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile

 Adding to my earlier tests:
 4.0 SQL from Scala and Python
 4.1 result = sqlContext.sql(SELECT * from Employees WHERE State = 'WA') OK
 4.2 result = sqlContext.sql(SELECT
 OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
 JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK
 4.3 result = sqlContext.sql(SELECT ShipCountry, Sum(OrderDetails.UnitPrice
 * Qty * Discount) AS ProductSales FROM Orders INNER JOIN OrderDetails ON
 Orders.OrderID = OrderDetails.OrderID GROUP BY ShipCountry) OK
 4.4 saveAsParquetFile OK
 4.5 Read and verify the 4.4 save - sqlContext.parquetFile,
 registerTempTable, sql OK

 Cheers  thanks Michael
 k/



 On Thu, Feb 19, 2015 at 12:02 PM, Michael Armbrust mich...@databricks.com
 wrote:

 P.S: For some reason replacing  import sqlContext.createSchemaRDD with 
 import sqlContext.implicits._ doesn't do the implicit conversations.
 registerTempTable
 gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?


 We will write up a whole migration guide before the final release, but I
 can quickly explain this one.  We made the implicit conversion
 significantly less broad to avoid the chance of confusing conflicts.
 However, now you have to call .toDF in order to force RDDs to become
 DataFrames.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Corey Nolet
+1 (non-binding)

- Verified signatures using [1]
- Built on MacOSX Yosemite
- Built on Fedora 21

Each build was run with and Hadoop-2.4 version with yarn, hive, and
hive-thriftserver profiles

I am having trouble getting all the tests passing on a single run on both
machines but we have this same problem on other projects as well.

[1] https://github.com/cjnolet/nexus-staging-gpg-verify


On Wed, Feb 18, 2015 at 6:25 PM, Sean Owen so...@cloudera.com wrote:

 On Wed, Feb 18, 2015 at 6:13 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Patrick this link gives a 404:
  https://people.apache.org/keys/committer/pwendell.asc
 
  Works for me. Maybe it's some ephemeral issue?

 Yes works now; I swear it didn't before! that's all set now. The
 signing key is in that file.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org