date:20150219

Re: Replacing Jetty with TomCat

2015-02-19 Thread Niranda Perera

Hi Sean,
The issue we have here is that all our products are based on a single
platform and we try to make all our products coherent with our platform as
much as possible. so, having two web services in one instance would not be
a very elegant solution. That is why we were seeking a way to switch it to
Tomcat. But as I understand, it is not readily supported, hence we will
have to accept it as it is.

If we are not using the Spark UIs, is it possible to disable the UIs and
prevent the jetty server from starting, but yet use the core spark
functionality?

Hi Corey,
thank you for your ideas. Our biggest concern here was that it starts a new
webserver inside spark. opening up new ports etc. might be seen as security
threats when it comes to commercial distributions.

cheers

On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen  wrote:

> I do not think it makes sense to make the web server configurable.
> Mostly because there's no real problem in running an HTTP service
> internally based on Netty while you run your own HTTP service based on
> something else like Tomcat. What's the problem?
>
> On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera
>  wrote:
> > Hi Sean,
> > The main issue we have is, running two web servers in a single product.
> we
> > think it would not be an elegant solution.
> >
> > Could you please point me to the main areas where jetty server is tightly
> > coupled or extension points where I could plug tomcat instead of jetty?
> > If successful I could contribute it to the spark project. :-)
> >
> > cheers
> >
> >
> >
> > On Mon, Feb 16, 2015 at 4:51 PM, Sean Owen  wrote:
> >>
> >> There's no particular reason you have to remove the embedded Jetty
> >> server, right? it doesn't prevent you from using it inside another app
> >> that happens to run in Tomcat. You won't be able to switch it out
> >> without rewriting a fair bit of code, no, but you don't need to.
> >>
> >> On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera
> >>  wrote:
> >> > Hi,
> >> >
> >> > We are thinking of integrating Spark server inside a product. Our
> >> > current
> >> > product uses Tomcat as its webserver.
> >> >
> >> > Is it possible to switch the Jetty webserver in Spark to Tomcat
> >> > off-the-shelf?
> >> >
> >> > Cheers
> >> >
> >> > --
> >> > Niranda
> >
> >
> >
> >
> > --
> > Niranda
>

-- 
Niranda

RE: spark slave cannot execute without admin permission on windows

2015-02-19 Thread Judy Nash

+ dev mailing list

If this is supposed to work, is there a regression then?

The spark core code shows the permission for copied file to \work is set to a+x 
at Line 442 of 
Utils.scala
 .
The example jar I used had all permissions including Read & Execute prior 
spark-submit:
[cid:image001.png@01D04BDA.A74C65E0]
However after copied to worker node’s \work folder, only limited permission 
left on the jar with no execution right.
[cid:image002.png@01D04BDA.A74C65E0]

From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, February 18, 2015 10:40 PM
To: Judy Nash
Cc: u...@spark.apache.org
Subject: Re: spark slave cannot execute without admin permission on windows

You need not require admin permission, but just make sure all those jars has 
execute permission ( read/write access)

Thanks
Best Regards

On Thu, Feb 19, 2015 at 11:30 AM, Judy Nash 
mailto:judyn...@exchange.microsoft.com>> wrote:
Hi,

Is it possible to configure spark to run without admin permission on windows?

My current setup run master & slave successfully with admin permission.
However, if I downgrade permission level from admin to user, SparkPi fails with 
the following exception on the slave node:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to s
tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task
0.3 in stage 0.0 (TID 9, 
workernode0.jnashsparkcurr2.d10.internal.cloudapp.net)
: java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi$$anonfun$1

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)

Upon investigation, it appears that sparkPi jar under 
spark_home\worker\appname\*.jar does not have execute permission set, causing 
spark not able to find class.

Advice would be very much appreciated.

Thanks,
Judy

Re: Replacing Jetty with TomCat

2015-02-19 Thread Sean Owen

Sure, but you are not using Netty at all. It's invisible to you. It's
not as if you have to set up and maintain a Jetty container. I don't
think your single platform for your apps is relevant.

You can turn off the UI, but as Reynold said, the HTTP servers are
also part of the core data transport functionality and you can't turn
that off. It's not merely unsupported to swap this out with an
arbitrary container, it's not clear it would work with Tomcat without
re-integrating with its behavior and tuning. But it also shouldn't
matter to anyone.

On Thu, Feb 19, 2015 at 8:11 AM, Niranda Perera
 wrote:
> Hi Sean,
> The issue we have here is that all our products are based on a single
> platform and we try to make all our products coherent with our platform as
> much as possible. so, having two web services in one instance would not be a
> very elegant solution. That is why we were seeking a way to switch it to
> Tomcat. But as I understand, it is not readily supported, hence we will have
> to accept it as it is.
>
> If we are not using the Spark UIs, is it possible to disable the UIs and
> prevent the jetty server from starting, but yet use the core spark
> functionality?
>
> Hi Corey,
> thank you for your ideas. Our biggest concern here was that it starts a new
> webserver inside spark. opening up new ports etc. might be seen as security
> threats when it comes to commercial distributions.
>
> cheers
>
>
>
> On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen  wrote:
>>
>> I do not think it makes sense to make the web server configurable.
>> Mostly because there's no real problem in running an HTTP service
>> internally based on Netty while you run your own HTTP service based on
>> something else like Tomcat. What's the problem?
>>
>> On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera
>>  wrote:
>> > Hi Sean,
>> > The main issue we have is, running two web servers in a single product.
>> > we
>> > think it would not be an elegant solution.
>> >
>> > Could you please point me to the main areas where jetty server is
>> > tightly
>> > coupled or extension points where I could plug tomcat instead of jetty?
>> > If successful I could contribute it to the spark project. :-)
>> >
>> > cheers
>> >
>> >
>> >
>> > On Mon, Feb 16, 2015 at 4:51 PM, Sean Owen  wrote:
>> >>
>> >> There's no particular reason you have to remove the embedded Jetty
>> >> server, right? it doesn't prevent you from using it inside another app
>> >> that happens to run in Tomcat. You won't be able to switch it out
>> >> without rewriting a fair bit of code, no, but you don't need to.
>> >>
>> >> On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera
>> >>  wrote:
>> >> > Hi,
>> >> >
>> >> > We are thinking of integrating Spark server inside a product. Our
>> >> > current
>> >> > product uses Tomcat as its webserver.
>> >> >
>> >> > Is it possible to switch the Jetty webserver in Spark to Tomcat
>> >> > off-the-shelf?
>> >> >
>> >> > Cheers
>> >> >
>> >> > --
>> >> > Niranda
>> >
>> >
>> >
>> >
>> > --
>> > Niranda
>
>
>
>
> --
> Niranda

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Replacing Jetty with TomCat

2015-02-19 Thread Ewan Higgs


To add to Sean and Reynold's point:

Please correct me if I'm wrong, but Spark depends on hadoop-common which 
also uses jetty in the HttpServer2 code. So even if you remove jetty 
from Spark by making it an optional dependency, it will be pulled in by 
Hadoop.


So you'll still see that your program that depends on hypothetical 
Spark-Tomcat will still pull in jetty jars.


-Ewan


On 19/02/15 10:23, Sean Owen wrote:

Sure, but you are not using Netty at all. It's invisible to you. It's
not as if you have to set up and maintain a Jetty container. I don't
think your single platform for your apps is relevant.

You can turn off the UI, but as Reynold said, the HTTP servers are
also part of the core data transport functionality and you can't turn
that off. It's not merely unsupported to swap this out with an
arbitrary container, it's not clear it would work with Tomcat without
re-integrating with its behavior and tuning. But it also shouldn't
matter to anyone.

On Thu, Feb 19, 2015 at 8:11 AM, Niranda Perera
 wrote:

Hi Sean,
The issue we have here is that all our products are based on a single
platform and we try to make all our products coherent with our platform as
much as possible. so, having two web services in one instance would not be a
very elegant solution. That is why we were seeking a way to switch it to
Tomcat. But as I understand, it is not readily supported, hence we will have
to accept it as it is.

If we are not using the Spark UIs, is it possible to disable the UIs and
prevent the jetty server from starting, but yet use the core spark
functionality?

Hi Corey,
thank you for your ideas. Our biggest concern here was that it starts a new
webserver inside spark. opening up new ports etc. might be seen as security
threats when it comes to commercial distributions.

cheers



On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen  wrote:

I do not think it makes sense to make the web server configurable.
Mostly because there's no real problem in running an HTTP service
internally based on Netty while you run your own HTTP service based on
something else like Tomcat. What's the problem?

On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera
 wrote:

Hi Sean,
The main issue we have is, running two web servers in a single product.
we
think it would not be an elegant solution.

Could you please point me to the main areas where jetty server is
tightly
coupled or extension points where I could plug tomcat instead of jetty?
If successful I could contribute it to the spark project. :-)

cheers



On Mon, Feb 16, 2015 at 4:51 PM, Sean Owen  wrote:

There's no particular reason you have to remove the embedded Jetty
server, right? it doesn't prevent you from using it inside another app
that happens to run in Tomcat. You won't be able to switch it out
without rewriting a fair bit of code, no, but you don't need to.

On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera
 wrote:

Hi,

We are thinking of integrating Spark server inside a product. Our
current
product uses Tomcat as its webserver.

Is it possible to switch the Jetty webserver in Spark to Tomcat
off-the-shelf?

Cheers

--
Niranda




--
Niranda




--
Niranda

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread The Watcher

I have done some testing of inserting into tables defined in Hive using 1.2
and I can see that the PARTITION clause is honored : data files get created
in multiple subdirectories correctly.

I tried the SKEWED BY ON STORED AS DIRECTORIES clause on the CREATE TABLE
clause but I didn't see subdirectories being created in that case.

1) is SKEWED BY honored ? If so, has anyone run into directories not being
created ?

2) if it is not honored, does it matter ? Hive introduced this feature to
better handle joins where tables had a skewed distribution on keys joined
on so that the single mapper handling one of the keys didn't hold up the
whole process. Could that happen in Spark / Spark SQL ?

Thanks

Have Friedman's glmnet algo running in Spark

2015-02-19 Thread mike

Dev List,
A couple of colleagues and I have gotten several versions of glmnet algo coded 
and running on Spark RDD. glmnet algo (http://www.jstatsoft.org/v33/i01/paper) 
is a very fast algorithm for generating coefficient paths solving penalized 
regression with elastic net penalties. The algorithm runs fast by taking an 
approach that generates solutions for a wide variety of penalty parameter. 
We're able to integrate into Mllib class structure a couple of different ways. 
The algorithm may fit better into the new pipeline structure since it naturally 
returns a multitide of models (corresponding to different vales of penalty 
parameters). That appears to fit better into pipeline than Mllib linear 
regression (for example).

We've got regression running with the speed optimizations that Friedman 
recommends. We'll start working on the logistic regression version next.

We're eager to make the code available as open source and would like to get 
some feedback about how best to do that. Any thoughts?
Mike Bowles.

Re: Hive SKEWED feature supported in Spark SQL ?

2015-02-19 Thread Michael Armbrust

>
> 1) is SKEWED BY honored ? If so, has anyone run into directories not being
> created ?
>

It is not.

2) if it is not honored, does it matter ? Hive introduced this feature to
> better handle joins where tables had a skewed distribution on keys joined
> on so that the single mapper handling one of the keys didn't hold up the
> whole process. Could that happen in Spark / Spark SQL?
>

It could matter for very skewed data, though I have not heard many
complaints.  We could consider adding it in the future if people are having
problems with skewed data.

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Michael Armbrust

>
> P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
> import sqlContext.implicits._" doesn't do the implicit conversations.
> registerTempTable
> gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?


We will write up a whole migration guide before the final release, but I
can quickly explain this one.  We made the implicit conversion
significantly less broad to avoid the chance of confusing conflicts.
However, now you have to call .toDF in order to force RDDs to become
DataFrames.

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Krishna Sankar

Excellent. Explicit toDF() works.
a) employees.toDF().registerTempTable("Employees") - works
b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile

Adding to my earlier tests:
4.0 SQL from Scala and Python
4.1 result = sqlContext.sql("SELECT * from Employees WHERE State = 'WA'") OK
4.2 result = sqlContext.sql("SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
4.3 result = sqlContext.sql("SELECT ShipCountry, Sum(OrderDetails.UnitPrice
* Qty * Discount) AS ProductSales FROM Orders INNER JOIN OrderDetails ON
Orders.OrderID = OrderDetails.OrderID GROUP BY ShipCountry") OK
4.4 saveAsParquetFile OK
4.5 Read and verify the 4.4 save - sqlContext.parquetFile,
registerTempTable, sql OK

Cheers & thanks Michael




On Thu, Feb 19, 2015 at 12:02 PM, Michael Armbrust 
wrote:

> P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
>> import sqlContext.implicits._" doesn't do the implicit conversations.
>> registerTempTable
>> gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?
>
>
> We will write up a whole migration guide before the final release, but I
> can quickly explain this one.  We made the implicit conversion
> significantly less broad to avoid the chance of confusing conflicts.
> However, now you have to call .toDF in order to force RDDs to become
> DataFrames.
>

Spark SQL, Hive & Parquet data types

2015-02-19 Thread The Watcher

Still trying to get my head around Spark SQL & Hive.

1) Let's assume I *only* use Spark SQL to create and insert data into HIVE
tables, declared in a Hive meta-store.

Does it matter at all if Hive supports the data types I need with Parquet,
or is all that matters what Catalyst & spark's parquet relation support ?

Case in point : timestamps & Parquet
* Parquet now supports them as per
https://github.com/Parquet/parquet-mr/issues/218
* Hive only supports them in 0.14
So would I be able to read/write timestamps natively in Spark 1.2 ? Spark
1.3 ?

I have found this thread
http://apache-spark-user-list.1001560.n3.nabble.com/timestamp-not-implemented-yet-td15414.html
which seems to indicate that the data types supported by Hive would matter
to Spark SQL.
If so, why is that ? Doesn't the read path go through Spark SQL to read the
parquet file ?

2) Is there planned support for Hive 0.14 ?

Thanks

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Timothy Chen

+1 (non-binding)

Tested Mesos coarse/fine-grained mode with 4 nodes Mesos cluster with
simple shuffle/map task.

Will be testing with more complete suite (ie: spark-perf) once the
infrastructure is setup to do so.

Tim

On Thu, Feb 19, 2015 at 12:50 PM, Krishna Sankar  wrote:
> Excellent. Explicit toDF() works.
> a) employees.toDF().registerTempTable("Employees") - works
> b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile
>
> Adding to my earlier tests:
> 4.0 SQL from Scala and Python
> 4.1 result = sqlContext.sql("SELECT * from Employees WHERE State = 'WA'") OK
> 4.2 result = sqlContext.sql("SELECT
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.3 result = sqlContext.sql("SELECT ShipCountry, Sum(OrderDetails.UnitPrice
> * Qty * Discount) AS ProductSales FROM Orders INNER JOIN OrderDetails ON
> Orders.OrderID = OrderDetails.OrderID GROUP BY ShipCountry") OK
> 4.4 saveAsParquetFile OK
> 4.5 Read and verify the 4.4 save - sqlContext.parquetFile,
> registerTempTable, sql OK
>
> Cheers & thanks Michael
> 
>
>
>
> On Thu, Feb 19, 2015 at 12:02 PM, Michael Armbrust 
> wrote:
>
>> P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
>>> import sqlContext.implicits._" doesn't do the implicit conversations.
>>> registerTempTable
>>> gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?
>>
>>
>> We will write up a whole migration guide before the final release, but I
>> can quickly explain this one.  We made the implicit conversion
>> significantly less broad to avoid the chance of confusing conflicts.
>> However, now you have to call .toDF in order to force RDDs to become
>> DataFrames.
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Corey Nolet

+1 (non-binding)

- Verified signatures using [1]
- Built on MacOSX Yosemite
- Built on Fedora 21

Each build was run with and Hadoop-2.4 version with yarn, hive, and
hive-thriftserver profiles

I am having trouble getting all the tests passing on a single run on both
machines but we have this same problem on other projects as well.

[1] https://github.com/cjnolet/nexus-staging-gpg-verify


On Wed, Feb 18, 2015 at 6:25 PM, Sean Owen  wrote:

> On Wed, Feb 18, 2015 at 6:13 PM, Patrick Wendell 
> wrote:
> >> Patrick this link gives a 404:
> >> https://people.apache.org/keys/committer/pwendell.asc
> >
> > Works for me. Maybe it's some ephemeral issue?
>
> Yes works now; I swear it didn't before! that's all set now. The
> signing key is in that file.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: Replacing Jetty with TomCat

RE: spark slave cannot execute without admin permission on windows

Re: Replacing Jetty with TomCat

Re: Replacing Jetty with TomCat

Hive SKEWED feature supported in Spark SQL ?

Have Friedman's glmnet algo running in Spark

Re: Hive SKEWED feature supported in Spark SQL ?

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Spark SQL, Hive & Parquet data types

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

12 matches

Site Navigation

Mail list logo

Footer information