Re: Spark Job not exited and shows running

2016-11-30 Thread ayan guha
Can you add sc.stop at the end of the code and try?
On 1 Dec 2016 18:03, "Daniel van der Ende" 
wrote:

> Hi,
>
> I've seen this a few times too. Usually it indicates that your driver
> doesn't have enough resources to process the result. Sometimes increasing
> driver memory is enough (yarn memory overhead can also help). Is there any
> specific reason for you to run in client mode and not in cluster mode?
> Having run into this a number of times (and wanting to spare the resources
> of our submitting machines) we have now switched to use yarn cluster mode
> by default. This seems to resolve the problem.
>
> Hope this helps,
>
> Daniel
>
> On 29 Nov 2016 11:20 p.m., "Selvam Raman"  wrote:
>
>> Hi,
>>
>> I have submitted spark job in yarn client mode. The executor and cores
>> were dynamically allocated. In the job i have 20 partitions, so 5 container
>> each with 4 core has been submitted. It almost processed all the records
>> but it never exit the job and in the application master container i am
>> seeing the below error message.
>>
>>  INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
>>  WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
>>
>>
>>
>> ​The same job i ran it for only 1000 records which successfully finished.
>> ​
>>
>> Can anyone help me to sort out this issue.
>>
>> Spark version:2.0( AWS EMR).
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>


Re: Spark Job not exited and shows running

2016-11-30 Thread Daniel van der Ende
Hi,

I've seen this a few times too. Usually it indicates that your driver
doesn't have enough resources to process the result. Sometimes increasing
driver memory is enough (yarn memory overhead can also help). Is there any
specific reason for you to run in client mode and not in cluster mode?
Having run into this a number of times (and wanting to spare the resources
of our submitting machines) we have now switched to use yarn cluster mode
by default. This seems to resolve the problem.

Hope this helps,

Daniel

On 29 Nov 2016 11:20 p.m., "Selvam Raman"  wrote:

> Hi,
>
> I have submitted spark job in yarn client mode. The executor and cores
> were dynamically allocated. In the job i have 20 partitions, so 5 container
> each with 4 core has been submitted. It almost processed all the records
> but it never exit the job and in the application master container i am
> seeing the below error message.
>
>  INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
>  WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
>
>
>
> ​The same job i ran it for only 1000 records which successfully finished. ​
>
> Can anyone help me to sort out this issue.
>
> Spark version:2.0( AWS EMR).
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>


Re: Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-11-30 Thread Deepak Sharma
In Spark > 2.0 , spark session was introduced that you can use to query
hive as well.
Just make sure you create spark session with enableHiveSupport() option.

Thanks
Deepak

On Thu, Dec 1, 2016 at 12:27 PM, shyla deshpande 
wrote:

> I am Spark 2.0.2 , using DStreams because I need Cassandra Sink.
>
> How do I create SQLContext? I get the error SQLContext deprecated.
>
>
> *[image: Inline image 1]*
>
> *Thanks*
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: SPARK-SUBMIT and optional args like -h etc

2016-11-30 Thread Daniel van der Ende
Hi,

Looks like the ordering of your parameters to spark submit is different on
Windows vs EMR. I assume the -h flag is for an arguments for your python
script? In that case you'll need to put the arguments after the python
script.

Daniel

On 1 Dec 2016 6:24 a.m., "Patnaik, Vandana" 
wrote:

> Hello All,
>
> I am new to spark and am wondering how to pass an optional argument to my
> python program using SPARK-SUBMIT.
>
> This works fine on my local machine but not on AWS EMR:
>
> On Windows:
> C:\Vandana\spark\examples>..\bin\spark-submit new_profile_csv1.py  -h 0
> -t  exam ple_float.txt On EMR:
> spark-submit  -h 0  How.py   hdfs://10.1.X.XXX:8020/user/hadoop/hello.dat
>
> This does not work..
>
> Thanks
>
> Vandana
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-11-30 Thread shyla deshpande
I am Spark 2.0.2 , using DStreams because I need Cassandra Sink.

How do I create SQLContext? I get the error SQLContext deprecated.


*[image: Inline image 1]*

*Thanks*


Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
Here is another transformation that might cause the error but it has to be
one of these two since I only have two transformations

jsonMessagesDStream
.window(new Duration(6), new Duration(1000))
.mapToPair(new PairFunction() {
@Override
public Tuple2 call(String s) throws Exception {
//System.out.println(s + " *");
JsonParser parser = new JsonParser();
JsonObject jsonObj = parser.parse(s).getAsJsonObject();

if (jsonObj != null && jsonObj.has("var1")) {
JsonObject jsonObject =
jsonObj.get("var1").getAsJsonObject();
if (jsonObject != null && jsonObject.has("var2")
&& jsonObject.get("var2").getAsBoolean() && jsonObject.has("var3") ) {
long num = jsonObject.get("var3").getAsLong();

return new Tuple2("var3", num);
}
}

return new Tuple2("var3", 0L);
}
}).reduceByKey(new Function2() {
@Override
public Long call(Long v1, Long v2) throws Exception {
return v1+v2;
 }
}).foreachRDD(new VoidFunction>() {
@Override
public void call(JavaPairRDD
stringIntegerJavaPairRDD) throws Exception {
Map map = new HashMap<>();
Gson gson = new Gson();
stringIntegerJavaPairRDD
.collect()
.forEach((Tuple2 KV) -> {
String status = KV._1();
Long count = KV._2();
map.put(status, count);
}
);
NSQReceiver.send(producer, "dashboard",
gson.toJson(map).getBytes());
}
});


On Wed, Nov 30, 2016 at 10:40 PM, kant kodali  wrote:

> Hi Marco,
>
>
> Here is what my code looks like
>
> Config config = new Config("hello");
> SparkConf sparkConf = config.buildSparkConfig();
> sparkConf.setJars(JavaSparkContext.jarOfClass(Driver.class));
> JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new 
> Duration(config.getSparkStremingBatchInterval()));
> ssc.sparkContext().setLogLevel("ERROR");
>
>
> NSQReceiver sparkStreamingReceiver = new NSQReceiver(config, "input_test");
> JavaReceiverInputDStream jsonMessagesDStream = 
> ssc.receiverStream(sparkStreamingReceiver);
>
>
> NSQProducer producer = new NSQProducer()
> .addAddress(config.getServerConfig().getProperty("NSQD_IP"), 
> Integer.parseInt(config.getServerConfig().getProperty("NSQD_PORT")))
> .start();
>
> jsonMessagesDStream
> .mapToPair(new PairFunction() {
> @Override
> public Tuple2 call(String s) throws Exception {
> JsonParser parser = new JsonParser();
> JsonObject jsonObj = parser.parse(s).getAsJsonObject();
> if (jsonObj != null && jsonObj.has("var1") ) {
> JsonObject transactionObject = 
> jsonObj.get("var1").getAsJsonObject();
> if(transactionObject != null && 
> transactionObject.has("var2")) {
> String key = 
> transactionObject.get("var2").getAsString();
> return new Tuple2<>(key, 1);
> }
> }
> return new Tuple2<>("", 0);
> }
> }).reduceByKey(new Function2() {
> @Override
> public Integer call(Integer v1, Integer v2) throws Exception {
> return v1+v2;
> }
> }).foreachRDD(new VoidFunction>() {
> @Override
> public void call(JavaPairRDD 
> stringIntegerJavaPairRDD) throws Exception {
> Map map = new HashMap<>();
> Gson gson = new Gson();
> stringIntegerJavaPairRDD
> .collect()
> .forEach((Tuple2 KV) -> {
> String status = KV._1();
> Integer count = KV._2();
> map.put(status, count);
> }
> );
> NSQReceiver.send(producer, "output_777", 
> gson.toJson(map).getBytes());
> }
> });
>
>
> Thanks,
>
> kant
>
>
> On Wed, Nov 30, 2016 at 2:11 PM, Marco Mistroni 
> wrote:
>
>> Could you paste reproducible snippet code?
>> Kr
>>
>> On 30 Nov 2016 9:08 pm, "kant kodali" 

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
Hi Marco,


Here is what my code looks like

Config config = new Config("hello");
SparkConf sparkConf = config.buildSparkConfig();
sparkConf.setJars(JavaSparkContext.jarOfClass(Driver.class));
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new
Duration(config.getSparkStremingBatchInterval()));
ssc.sparkContext().setLogLevel("ERROR");


NSQReceiver sparkStreamingReceiver = new NSQReceiver(config, "input_test");
JavaReceiverInputDStream jsonMessagesDStream =
ssc.receiverStream(sparkStreamingReceiver);


NSQProducer producer = new NSQProducer()
.addAddress(config.getServerConfig().getProperty("NSQD_IP"),
Integer.parseInt(config.getServerConfig().getProperty("NSQD_PORT")))
.start();

jsonMessagesDStream
.mapToPair(new PairFunction() {
@Override
public Tuple2 call(String s) throws Exception {
JsonParser parser = new JsonParser();
JsonObject jsonObj = parser.parse(s).getAsJsonObject();
if (jsonObj != null && jsonObj.has("var1") ) {
JsonObject transactionObject =
jsonObj.get("var1").getAsJsonObject();
if(transactionObject != null &&
transactionObject.has("var2")) {
String key =
transactionObject.get("var2").getAsString();
return new Tuple2<>(key, 1);
}
}
return new Tuple2<>("", 0);
}
}).reduceByKey(new Function2() {
@Override
public Integer call(Integer v1, Integer v2) throws Exception {
return v1+v2;
}
}).foreachRDD(new VoidFunction>() {
@Override
public void call(JavaPairRDD
stringIntegerJavaPairRDD) throws Exception {
Map map = new HashMap<>();
Gson gson = new Gson();
stringIntegerJavaPairRDD
.collect()
.forEach((Tuple2 KV) -> {
String status = KV._1();
Integer count = KV._2();
map.put(status, count);
}
);
NSQReceiver.send(producer, "output_777",
gson.toJson(map).getBytes());
}
});


Thanks,

kant


On Wed, Nov 30, 2016 at 2:11 PM, Marco Mistroni  wrote:

> Could you paste reproducible snippet code?
> Kr
>
> On 30 Nov 2016 9:08 pm, "kant kodali"  wrote:
>
>> I have lot of these exceptions happening
>>
>> java.lang.Exception: Could not compute split, block input-0-1480539568000
>> not found
>>
>>
>> Any ideas what this could be?
>>
>


Re: updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread shyla deshpande
Thanks Miguel for the response.

Works great. I am having a tuple for my key and the values are String and
returning String to the updateStateByKey.

On Wed, Nov 30, 2016 at 12:33 PM, Miguel Morales 
wrote:

> I *think* you can return a map to updateStateByKey which would include
> your fields.  Another approach would be to create a hash (like create a
> json version of the hash and return that.)
>
> On Wed, Nov 30, 2016 at 12:30 PM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> updateStateByKey - Can this be used when the key is multi-column (like a
>> composite key ) and the value is not numeric. All the examples I have come
>> across is where the key is a simple String and the Value is numeric.
>>
>> Appreciate any help.
>>
>> Thanks
>>
>
>


Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne
8080 is just the normal web UI. Which is the information I want, ie 
Running Applications, but in HTML format. I want it in JSON so I don't 
have to be scraping and parsing HTML.


From my understanding api/v1/applications should do the trick ... 
except it doesn't.


Ah well.

On 1/12/2016 4:00 PM, Miguel Morales wrote:

Don't have a Spark cluster up to verify this, but try port 8080.

http://spark-master-ip:8080/api/v1/applications.

But glad to hear you're getting somewhere, best of luck.

On Wed, Nov 30, 2016 at 9:59 PM, Carl Ballantyne 
> wrote:


Hmmm getting closer I think.

I thought this was only for Mesos and Yarn clusters (from reading
the documentation). I tried anyway and initially received
Connection Refused. So I ran ./start-history-server.sh. This was
on the Spark Master instance.

I now get 404 not found.

Nothing in the log file for the history server indicates there was
a problem.

I will keep digging around. Thanks for your help so far Miguel.


On 1/12/2016 3:33 PM, Miguel Morales wrote:

Try hitting:http://:18080/api/v1

Then hit /applications.

That should give you a list of running spark jobs on a given server.

On Wed, Nov 30, 2016 at 9:30 PM, Carl Ballantyne
   wrote:

Yes I was looking at this. But it says I need to access the driver 
-http://:4040.

I don't have a running driver Spark instance since I am submitting jobs to 
Spark using the SparkLauncher class. Or maybe I am missing something obvious. 
Apologies if so.




On 1/12/2016 3:21 PM, Miguel Morales wrote:

Check the Monitoring and Instrumentation 
API:http://spark.apache.org/docs/latest/monitoring.html


On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne 
  wrote:

Hi All,

I want to get the running applications for my Spark Standalone cluster in 
JSON format. The same information displayed on the web UI on port 8080 ... but 
in JSON.

Is there an easy way to do this? It seems I need to scrap the HTML page in 
order to get this information.

The reason I want to know this information is so I can ensure the Spark 
cluster does not get too many jobs submitted at once. A Stand alone cluster 
processes jobs FIFO. I would prefer to just send back a message to the user 
telling them to try later then submit a job which has to wait for other jobs to 
finish before starting.

Any help appreciated. Thanks.

Cheers,
Carl

-
To unsubscribe e-mail:user-unsubscr...@spark.apache.org



--
Carl Ballantyne
Lead Reporting Developer
Guvera Operations Pty Ltd.
Suite 1b, 58 Kingston Drive
Helensvale, QLD, 4212
Australia
PO Box 3330
Helensvale Town Centre, QLD, 4212
Phone+61 (0) 7 5578 8987 
emailcarl.ballant...@guvera.com 
Webwww.guveralimited.com 


-- 
*Carl Ballantyne* Lead Reporting Developer

*Guvera Operations Pty Ltd.* Suite 1b, 58 Kingston Drive
Helensvale, QLD, 4212 Australia
PO Box 3330 Helensvale Town Centre, QLD, 4212
*Phone *+61 (0) 7 5578 8987  *Email
*carl.ballant...@guvera.com 
*Web *www.guveralimited.com 


--
*Carl Ballantyne* Lead Reporting Developer
*Guvera Operations Pty Ltd.* Suite 1b, 58 Kingston Drive Helensvale, 
QLD, 4212 Australia

PO Box 3330 Helensvale Town Centre, QLD, 4212
*Phone *+61 (0) 7 5578 8987 *Email *carl.ballant...@guvera.com 
 *Web *www.guveralimited.com 





Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Miguel Morales
Don't have a Spark cluster up to verify this, but try port 8080.

http://spark-master-ip:8080/api/v1/applications.

But glad to hear you're getting somewhere, best of luck.

On Wed, Nov 30, 2016 at 9:59 PM, Carl Ballantyne  wrote:

> Hmmm getting closer I think.
>
> I thought this was only for Mesos and Yarn clusters (from reading the
> documentation). I tried anyway and initially received Connection Refused.
> So I ran ./start-history-server.sh. This was on the Spark Master instance.
>
> I now get 404 not found.
>
> Nothing in the log file for the history server indicates there was a
> problem.
>
> I will keep digging around. Thanks for your help so far Miguel.
>
> On 1/12/2016 3:33 PM, Miguel Morales wrote:
>
> Try hitting:  http://:18080/api/v1
>
> Then hit /applications.
>
> That should give you a list of running spark jobs on a given server.
>
> On Wed, Nov 30, 2016 at 9:30 PM, Carl Ballantyne 
>  wrote:
>
> Yes I was looking at this. But it says I need to access the driver - 
> http://:4040.
>
> I don't have a running driver Spark instance since I am submitting jobs to 
> Spark using the SparkLauncher class. Or maybe I am missing something obvious. 
> Apologies if so.
>
>
>
>
> On 1/12/2016 3:21 PM, Miguel Morales wrote:
>
> Check the Monitoring and Instrumentation API: 
> http://spark.apache.org/docs/latest/monitoring.html
>
> On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne  
>  wrote:
>
> Hi All,
>
> I want to get the running applications for my Spark Standalone cluster in 
> JSON format. The same information displayed on the web UI on port 8080 ... 
> but in JSON.
>
> Is there an easy way to do this? It seems I need to scrap the HTML page in 
> order to get this information.
>
> The reason I want to know this information is so I can ensure the Spark 
> cluster does not get too many jobs submitted at once. A Stand alone cluster 
> processes jobs FIFO. I would prefer to just send back a message to the user 
> telling them to try later then submit a job which has to wait for other jobs 
> to finish before starting.
>
> Any help appreciated. Thanks.
>
> Cheers,
> Carl
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
> --
> Carl Ballantyne
> Lead Reporting Developer
> Guvera Operations Pty Ltd.
> Suite 1b, 58 Kingston Drive
> Helensvale, QLD, 4212
> Australia
> PO Box 3330
> Helensvale Town Centre, QLD, 4212
> Phone +61 (0) 7 5578 8987 <+61%207%205578%208987>
> Email carl.ballant...@guvera.com
> Web www.guveralimited.com
>
>
> --
> *Carl Ballantyne*
> Lead Reporting Developer
> *Guvera Operations Pty Ltd.*
> Suite 1b, 58 Kingston Drive
> Helensvale, QLD, 4212
> Australia
> PO Box 3330
> Helensvale Town Centre, QLD, 4212
> *Phone *+61 (0) 7 5578 8987 <+61%207%205578%208987>
> *Email *carl.ballant...@guvera.com
> *Web *www.guveralimited.com
>
>
>


Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne

Hmmm getting closer I think.

I thought this was only for Mesos and Yarn clusters (from reading the 
documentation). I tried anyway and initially received Connection 
Refused. So I ran ./start-history-server.sh. This was on the Spark 
Master instance.


I now get 404 not found.

Nothing in the log file for the history server indicates there was a 
problem.


I will keep digging around. Thanks for your help so far Miguel.


On 1/12/2016 3:33 PM, Miguel Morales wrote:

Try hitting:  http://:18080/api/v1

Then hit /applications.

That should give you a list of running spark jobs on a given server.

On Wed, Nov 30, 2016 at 9:30 PM, Carl Ballantyne
 wrote:

Yes I was looking at this. But it says I need to access the driver - 
http://:4040.

I don't have a running driver Spark instance since I am submitting jobs to 
Spark using the SparkLauncher class. Or maybe I am missing something obvious. 
Apologies if so.




On 1/12/2016 3:21 PM, Miguel Morales wrote:

Check the Monitoring and Instrumentation API: 
http://spark.apache.org/docs/latest/monitoring.html

On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne  
wrote:

Hi All,

I want to get the running applications for my Spark Standalone cluster in JSON 
format. The same information displayed on the web UI on port 8080 ... but in 
JSON.

Is there an easy way to do this? It seems I need to scrap the HTML page in 
order to get this information.

The reason I want to know this information is so I can ensure the Spark cluster 
does not get too many jobs submitted at once. A Stand alone cluster processes 
jobs FIFO. I would prefer to just send back a message to the user telling them 
to try later then submit a job which has to wait for other jobs to finish 
before starting.

Any help appreciated. Thanks.

Cheers,
Carl

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



--
Carl Ballantyne
Lead Reporting Developer
Guvera Operations Pty Ltd.
Suite 1b, 58 Kingston Drive
Helensvale, QLD, 4212
Australia
PO Box 3330
Helensvale Town Centre, QLD, 4212
Phone +61 (0) 7 5578 8987
Email carl.ballant...@guvera.com
Web www.guveralimited.com




--
*Carl Ballantyne*
Lead Reporting Developer
*Guvera Operations Pty Ltd.*
Suite 1b, 58 Kingston Drive
Helensvale, QLD, 4212
Australia
PO Box 3330
Helensvale Town Centre, QLD, 4212
*Phone *+61 (0) 7 5578 8987
*Email *carl.ballant...@guvera.com 
*Web *www.guveralimited.com 




Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Miguel Morales
Try hitting:  http://:18080/api/v1

Then hit /applications.

That should give you a list of running spark jobs on a given server.

On Wed, Nov 30, 2016 at 9:30 PM, Carl Ballantyne
 wrote:
>
> Yes I was looking at this. But it says I need to access the driver - 
> http://:4040.
>
> I don't have a running driver Spark instance since I am submitting jobs to 
> Spark using the SparkLauncher class. Or maybe I am missing something obvious. 
> Apologies if so.
>
>
>
>
> On 1/12/2016 3:21 PM, Miguel Morales wrote:
>
> Check the Monitoring and Instrumentation API: 
> http://spark.apache.org/docs/latest/monitoring.html
>
> On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne  
> wrote:
>>
>> Hi All,
>>
>> I want to get the running applications for my Spark Standalone cluster in 
>> JSON format. The same information displayed on the web UI on port 8080 ... 
>> but in JSON.
>>
>> Is there an easy way to do this? It seems I need to scrap the HTML page in 
>> order to get this information.
>>
>> The reason I want to know this information is so I can ensure the Spark 
>> cluster does not get too many jobs submitted at once. A Stand alone cluster 
>> processes jobs FIFO. I would prefer to just send back a message to the user 
>> telling them to try later then submit a job which has to wait for other jobs 
>> to finish before starting.
>>
>> Any help appreciated. Thanks.
>>
>> Cheers,
>> Carl
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>
>
> --
> Carl Ballantyne
> Lead Reporting Developer
> Guvera Operations Pty Ltd.
> Suite 1b, 58 Kingston Drive
> Helensvale, QLD, 4212
> Australia
> PO Box 3330
> Helensvale Town Centre, QLD, 4212
> Phone +61 (0) 7 5578 8987
> Email carl.ballant...@guvera.com
> Web www.guveralimited.com
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne
Yes I was looking at this. But it says I need to access the driver - 
|http://:4040.|


I don't have a running driver Spark instance since I am submitting jobs 
to Spark using the SparkLauncher class. Or maybe I am missing something 
obvious. Apologies if so.




On 1/12/2016 3:21 PM, Miguel Morales wrote:
Check the Monitoring and Instrumentation API: 
http://spark.apache.org/docs/latest/monitoring.html


On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne 
> wrote:


Hi All,

I want to get the running applications for my Spark Standalone
cluster in JSON format. The same information displayed on the web
UI on port 8080 ... but in JSON.

Is there an easy way to do this? It seems I need to scrap the HTML
page in order to get this information.

The reason I want to know this information is so I can ensure the
Spark cluster does not get too many jobs submitted at once. A
Stand alone cluster processes jobs FIFO. I would prefer to just
send back a message to the user telling them to try later then
submit a job which has to wait for other jobs to finish before
starting.

Any help appreciated. Thanks.

Cheers,
Carl

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org





--
*Carl Ballantyne*
Lead Reporting Developer
*Guvera Operations Pty Ltd.*
Suite 1b, 58 Kingston Drive
Helensvale, QLD, 4212
Australia
PO Box 3330
Helensvale Town Centre, QLD, 4212
*Phone *+61 (0) 7 5578 8987
*Email *carl.ballant...@guvera.com 
*Web *www.guveralimited.com 




SPARK-SUBMIT and optional args like -h etc

2016-11-30 Thread Patnaik, Vandana
Hello All, 

I am new to spark and am wondering how to pass an optional argument to my 
python program using SPARK-SUBMIT. 

This works fine on my local machine but not on AWS EMR: 

On Windows:
C:\Vandana\spark\examples>..\bin\spark-submit new_profile_csv1.py  -h 0 -t  
exam ple_float.txt On EMR:
spark-submit  -h 0  How.py   hdfs://10.1.X.XXX:8020/user/hadoop/hello.dat

This does not work.. 

Thanks

Vandana

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Miguel Morales
Check the Monitoring and Instrumentation API:
http://spark.apache.org/docs/latest/monitoring.html

On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne  wrote:

> Hi All,
>
> I want to get the running applications for my Spark Standalone cluster in
> JSON format. The same information displayed on the web UI on port 8080 ...
> but in JSON.
>
> Is there an easy way to do this? It seems I need to scrap the HTML page in
> order to get this information.
>
> The reason I want to know this information is so I can ensure the Spark
> cluster does not get too many jobs submitted at once. A Stand alone cluster
> processes jobs FIFO. I would prefer to just send back a message to the user
> telling them to try later then submit a job which has to wait for other
> jobs to finish before starting.
>
> Any help appreciated. Thanks.
>
> Cheers,
> Carl
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne

Hi All,

I want to get the running applications for my Spark Standalone cluster 
in JSON format. The same information displayed on the web UI on port 
8080 ... but in JSON.


Is there an easy way to do this? It seems I need to scrap the HTML page 
in order to get this information.


The reason I want to know this information is so I can ensure the Spark 
cluster does not get too many jobs submitted at once. A Stand alone 
cluster processes jobs FIFO. I would prefer to just send back a message 
to the user telling them to try later then submit a job which has to 
wait for other jobs to finish before starting.


Any help appreciated. Thanks.

Cheers,
Carl

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: SVM regression in Spark

2016-11-30 Thread roni
Hi Spark expert,
 Can anyone help for doing SVR (Support vector machine  regression) in
SPARK.
Thanks
R

On Tue, Nov 29, 2016 at 6:50 PM, roni  wrote:

> Hi All,
>  I am trying to change my R code to spark. I am using  SVM regression in R
> . It seems like spark is providing SVM classification .
> How can I get the regression results.
> In my R code  I am using  call to SVM () function in library("e1071") (
> ftp://cran.r-project.org/pub/R/web/packages/e1071/vignettes/svmdoc.pdf)
> svrObj <- svm(x ,
> y ,
> scale = TRUE,
> type = "nu-regression",
> kernel = "linear",
> nu = .9)
>
> Once I get the svm object back , I get the -
>  from the values.
>
> How can I do this in spark?
> Thanks in advance
> Roni
>
>


Re: PySpark to remote cluster

2016-11-30 Thread Felix Cheung
Spark 2.0.1 is running with a different py4j library than Spark 1.6.

You will probably run into other problems mixing versions though - is there a 
reason you can't run Spark 1.6 on the client?


_
From: Klaus Schaefers 
>
Sent: Wednesday, November 30, 2016 2:44 AM
Subject: PySpark to remote cluster
To: >


Hi,

I want to connect with a local Jupyter Notebook to a remote Spark cluster.
The Cluster is running Spark 2.0.1 and the Jupyter notebook is based on
Spark 1.6 and running in a docker image (Link). I try to init the
SparkContext like this:

import pyspark
sc = pyspark.SparkContext('spark://:7077')

However, this gives me the following exception:


ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 746, in send_command
raise Py4JError("Answer from Java side is empty")
py4j.protocol.Py4JError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 626, in send_command
response = connection.send_command(command)
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 750, in send_command
raise Py4JNetworkError("Error while sending or receiving", e)
py4j.protocol.Py4JNetworkError: Error while sending or receiving

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 740, in send_command
answer = smart_decode(self.stream.readline()[:-1])
File "/opt/conda/lib/python3.5/socket.py", line 575, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR:py4j.java_gateway:An error occurred while trying to connect to the
Java server
Traceback (most recent call last):
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 746, in send_command
raise Py4JError("Answer from Java side is empty")
py4j.protocol.Py4JError: Answer from Java side is empty

...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.5/site-packages/IPython/utils/PyColorize.py",
line 262, in format2
for atoken in generate_tokens(text.readline):
File "/opt/conda/lib/python3.5/tokenize.py", line 597, in _tokenize
raise TokenError("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (2, 0))


Is this error caused by the different spark versions?

Best,

Klaus




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-to-remote-cluster-tp28147.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.com.

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org





Unsubscribe

2016-11-30 Thread Sivakumar S



Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread Marco Mistroni
Could you paste reproducible snippet code?
Kr

On 30 Nov 2016 9:08 pm, "kant kodali"  wrote:

> I have lot of these exceptions happening
>
> java.lang.Exception: Could not compute split, block input-0-1480539568000
> not found
>
>
> Any ideas what this could be?
>


java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
I have lot of these exceptions happening

java.lang.Exception: Could not compute split, block input-0-1480539568000
not found


Any ideas what this could be?


Re: updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread Miguel Morales
I *think* you can return a map to updateStateByKey which would include your
fields.  Another approach would be to create a hash (like create a json
version of the hash and return that.)

On Wed, Nov 30, 2016 at 12:30 PM, shyla deshpande 
wrote:

> updateStateByKey - Can this be used when the key is multi-column (like a
> composite key ) and the value is not numeric. All the examples I have come
> across is where the key is a simple String and the Value is numeric.
>
> Appreciate any help.
>
> Thanks
>


updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread shyla deshpande
updateStateByKey - Can this be used when the key is multi-column (like a
composite key ) and the value is not numeric. All the examples I have come
across is where the key is a simple String and the Value is numeric.

Appreciate any help.

Thanks


Save the date: ApacheCon Miami, May 15-19, 2017

2016-11-30 Thread Rich Bowen
Dear Apache enthusiast,

ApacheCon and Apache Big Data will be held at the Intercontinental in
Miami, Florida, May 16-18, 2017. Submit your talks, and register, at
http://apachecon.com/  Talks aimed at the Big Data section of the event
should go to
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
while other talks should go to
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


ApacheCon is the best place to meet the people that develop the software
that you use and rely on. It’s also a great opportunity to deepen your
involvement in the project, and perhaps make the leap to contributing.
And we find that user case studies, showcasing how you use Apache
projects to solve real world problems, are very popular at this event.
So, do consider whether you have a use case that might make a good
presentation.

ApacheCon will have many different ways that you can participate:

Technical Content: We’ll have three days of technical sessions covering
many of the projects at the ASF. We’ll be publishing a schedule of talks
on March 9th, so that you can plan what you’ll be attending

BarCamp: The Apache BarCamp is a standard feature of ApacheCon - an
un-conference style event, where the schedule is determined on-site by
the attendees, and anything is fair game.

Lightning Talks: Even if you don’t give a full-length talk, the
Lightning Talks are five minute presentations on any topic related to
the ASF, and can be given by any attendee. If there’s something you’re
passionate about, consider giving a Lightning Talk.

Sponsor: It costs money to put on a conference, and this is a great
opportunity for companies involved in Apache projects, or who benefit
from Apache code - your employers - to get their name and products in
front of the community. Sponsors can start any any monetary level, and
can sponsor everything from the conference badge lanyard, through larger
items such as video recordings and evening events. For more information
on sponsoring ApacheCon, see http://apachecon.com/sponsor/

So, get your tickets today at http://apachecon.com/ and submit your
talks. ApacheCon Miami is going to be our best ApacheCon yet, and you,
and your project, can’t afford to miss it.

-- 
Rich Bowen - rbo...@apache.org
VP, Conferences
http://apachecon.com
@apachecon


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Reynold Xin
This should fix it: https://github.com/apache/spark/pull/16080



On Wed, Nov 30, 2016 at 10:55 AM, Timur Shenkao  wrote:

> Hello,
>
> Yes, I used hiveContext, sqlContext, sparkSession from Java, Scala,
> Python.
> Via spark-shell, spark-submit, IDE (PyCharm, Intellij IDEA).
> Everything is perfect because I have Hadoop cluster with configured &
> tuned HIVE.
>
> The reason of Michael's error is usually misconfigured or absent HIVE.
> Or may be absence of hive-site.xml in $SPARK_HOME/conf/ directory.
>
> On Wed, Nov 30, 2016 at 9:30 PM, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi Timur,
>>
>> did you use hiveContext or sqlContext or the spark way mentioned in the
>> http://spark.apache.org/docs/latest/sql-programming-guide.html?
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Nov 30, 2016 at 5:35 PM, Yin Huai  wrote:
>>
>>> Hello Michael,
>>>
>>> Thank you for reporting this issue. It will be fixed by
>>> https://github.com/apache/spark/pull/16080.
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>> On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao 
>>> wrote:
>>>
 Hi!

 Do you have real HIVE installation?
 Have you built Spark 2.1 & Spark 2.0 with HIVE support ( -Phive
 -Phive-thriftserver ) ?

 It seems that you use "default" Spark's HIVE 1.2.1. Your metadata is
 stored in local Derby DB which is visible to concrete Spark installation
 but not for all.

 On Wed, Nov 30, 2016 at 4:51 AM, Michael Allman 
 wrote:

> This is not an issue with all tables created in Spark 2.1, though I'm
> not sure why some work and some do not. I have found that a table created
> as such
>
> sql("create table test stored as parquet as select 1")
>
> in Spark 2.1 cannot be read in previous versions of Spark.
>
> Michael
>
>
> > On Nov 29, 2016, at 5:15 PM, Michael Allman 
> wrote:
> >
> > Hello,
> >
> > When I try to read from a Hive table created by Spark 2.1 in Spark
> 2.0 or earlier, I get an error:
> >
> > java.lang.ClassNotFoundException: Failed to load class for data
> source: hive.
> >
> > Is there a way to get previous versions of Spark to read tables
> written with Spark 2.1?
> >
> > Cheers,
> >
> > Michael
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

>>>
>>
>


Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Timur Shenkao
Hello,

Yes, I used hiveContext, sqlContext, sparkSession from Java, Scala, Python.
Via spark-shell, spark-submit, IDE (PyCharm, Intellij IDEA).
Everything is perfect because I have Hadoop cluster with configured & tuned
HIVE.

The reason of Michael's error is usually misconfigured or absent HIVE.
Or may be absence of hive-site.xml in $SPARK_HOME/conf/ directory.

On Wed, Nov 30, 2016 at 9:30 PM, Gourav Sengupta 
wrote:

> Hi Timur,
>
> did you use hiveContext or sqlContext or the spark way mentioned in the
> http://spark.apache.org/docs/latest/sql-programming-guide.html?
>
>
> Regards,
> Gourav Sengupta
>
> On Wed, Nov 30, 2016 at 5:35 PM, Yin Huai  wrote:
>
>> Hello Michael,
>>
>> Thank you for reporting this issue. It will be fixed by
>> https://github.com/apache/spark/pull/16080.
>>
>> Thanks,
>>
>> Yin
>>
>> On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao 
>> wrote:
>>
>>> Hi!
>>>
>>> Do you have real HIVE installation?
>>> Have you built Spark 2.1 & Spark 2.0 with HIVE support ( -Phive
>>> -Phive-thriftserver ) ?
>>>
>>> It seems that you use "default" Spark's HIVE 1.2.1. Your metadata is
>>> stored in local Derby DB which is visible to concrete Spark installation
>>> but not for all.
>>>
>>> On Wed, Nov 30, 2016 at 4:51 AM, Michael Allman 
>>> wrote:
>>>
 This is not an issue with all tables created in Spark 2.1, though I'm
 not sure why some work and some do not. I have found that a table created
 as such

 sql("create table test stored as parquet as select 1")

 in Spark 2.1 cannot be read in previous versions of Spark.

 Michael


 > On Nov 29, 2016, at 5:15 PM, Michael Allman 
 wrote:
 >
 > Hello,
 >
 > When I try to read from a Hive table created by Spark 2.1 in Spark
 2.0 or earlier, I get an error:
 >
 > java.lang.ClassNotFoundException: Failed to load class for data
 source: hive.
 >
 > Is there a way to get previous versions of Spark to read tables
 written with Spark 2.1?
 >
 > Cheers,
 >
 > Michael


 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>
>


Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Gourav Sengupta
Hi Timur,

did you use hiveContext or sqlContext or the spark way mentioned in the
http://spark.apache.org/docs/latest/sql-programming-guide.html?


Regards,
Gourav Sengupta

On Wed, Nov 30, 2016 at 5:35 PM, Yin Huai  wrote:

> Hello Michael,
>
> Thank you for reporting this issue. It will be fixed by
> https://github.com/apache/spark/pull/16080.
>
> Thanks,
>
> Yin
>
> On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao  wrote:
>
>> Hi!
>>
>> Do you have real HIVE installation?
>> Have you built Spark 2.1 & Spark 2.0 with HIVE support ( -Phive
>> -Phive-thriftserver ) ?
>>
>> It seems that you use "default" Spark's HIVE 1.2.1. Your metadata is
>> stored in local Derby DB which is visible to concrete Spark installation
>> but not for all.
>>
>> On Wed, Nov 30, 2016 at 4:51 AM, Michael Allman 
>> wrote:
>>
>>> This is not an issue with all tables created in Spark 2.1, though I'm
>>> not sure why some work and some do not. I have found that a table created
>>> as such
>>>
>>> sql("create table test stored as parquet as select 1")
>>>
>>> in Spark 2.1 cannot be read in previous versions of Spark.
>>>
>>> Michael
>>>
>>>
>>> > On Nov 29, 2016, at 5:15 PM, Michael Allman 
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > When I try to read from a Hive table created by Spark 2.1 in Spark 2.0
>>> or earlier, I get an error:
>>> >
>>> > java.lang.ClassNotFoundException: Failed to load class for data
>>> source: hive.
>>> >
>>> > Is there a way to get previous versions of Spark to read tables
>>> written with Spark 2.1?
>>> >
>>> > Cheers,
>>> >
>>> > Michael
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>


SPARK 2.0 CSV exports (https://issues.apache.org/jira/browse/SPARK-16893)

2016-11-30 Thread Gourav Sengupta
Hi Sean,

I think that the main issue was users importing the package while starting
SPARK just like the way we used to do in SPARK 1.6. After removing that
option from --package while starting SPARK 2.0 the issue of conflicting
libraries disappeared.

I have written about this in
https://github.com/databricks/spark-csv/issues/367. But perhaps mentioning
this in this email group as well helps.


Regards,
Gourav Sengupta


Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Yin Huai
Hello Michael,

Thank you for reporting this issue. It will be fixed by
https://github.com/apache/spark/pull/16080.

Thanks,

Yin

On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao  wrote:

> Hi!
>
> Do you have real HIVE installation?
> Have you built Spark 2.1 & Spark 2.0 with HIVE support ( -Phive
> -Phive-thriftserver ) ?
>
> It seems that you use "default" Spark's HIVE 1.2.1. Your metadata is
> stored in local Derby DB which is visible to concrete Spark installation
> but not for all.
>
> On Wed, Nov 30, 2016 at 4:51 AM, Michael Allman 
> wrote:
>
>> This is not an issue with all tables created in Spark 2.1, though I'm not
>> sure why some work and some do not. I have found that a table created as
>> such
>>
>> sql("create table test stored as parquet as select 1")
>>
>> in Spark 2.1 cannot be read in previous versions of Spark.
>>
>> Michael
>>
>>
>> > On Nov 29, 2016, at 5:15 PM, Michael Allman 
>> wrote:
>> >
>> > Hello,
>> >
>> > When I try to read from a Hive table created by Spark 2.1 in Spark 2.0
>> or earlier, I get an error:
>> >
>> > java.lang.ClassNotFoundException: Failed to load class for data
>> source: hive.
>> >
>> > Is there a way to get previous versions of Spark to read tables written
>> with Spark 2.1?
>> >
>> > Cheers,
>> >
>> > Michael
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Parallel dynamic partitioning producing duplicated data

2016-11-30 Thread Mehdi Ben Haj Abbes
Hi Folks,



I have a spark job reading a csv file into a dataframe. I register that
dataframe as a tempTable then I’m writing that dataframe/tempTable to hive
external table (using parquet format for storage)

I’m using this kind of command :

hiveContext.sql(*"INSERT INTO TABLE t PARTITION(statPart='string_value',
dynPart) SELECT * FROM tempTable"*);



Through this integration, for each csv line I will get a parquet
line/record. So if I count the csv files lines total number it must equals
the count of the parquet dataset produced.



I launch in parallel 20 of these jobs (to take advantage of idle
resources). Sometimes I get parquet count randomly slightly bigger than csv
count (mainly the difference concern one dynamic partition and one csv file
that has been integrated) but if I launch these job sequentially one after
the other I never get the problem of the different count.



Does anyone have  any idea about the cause of this problem (different
count). For me it is obvious that the parallel execution is causing the
issue and strongly believe that it happens when moving data from
hive.exec.stagingdir.prefix dir  to the hive final table location on hdfs



Thanks in advance.


PySpark to remote cluster

2016-11-30 Thread Klaus Schaefers
Hi,

I want to connect with a local Jupyter Notebook to a remote Spark cluster.
The Cluster is running Spark 2.0.1 and the Jupyter notebook is based on
Spark 1.6 and running in a docker image (Link). I try to init the
SparkContext like this:

import pyspark
sc = pyspark.SparkContext('spark://:7077')

However, this gives me the following exception:


ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 746, in send_command
raise Py4JError("Answer from Java side is empty")
py4j.protocol.Py4JError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 626, in send_command
response = connection.send_command(command)
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 750, in send_command
raise Py4JNetworkError("Error while sending or receiving", e)
py4j.protocol.Py4JNetworkError: Error while sending or receiving

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 740, in send_command
answer = smart_decode(self.stream.readline()[:-1])
  File "/opt/conda/lib/python3.5/socket.py", line 575, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR:py4j.java_gateway:An error occurred while trying to connect to the
Java server
Traceback (most recent call last):
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 746, in send_command
raise Py4JError("Answer from Java side is empty")
py4j.protocol.Py4JError: Answer from Java side is empty

…

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/IPython/utils/PyColorize.py",
line 262, in format2
for atoken in generate_tokens(text.readline):
  File "/opt/conda/lib/python3.5/tokenize.py", line 597, in _tokenize
raise TokenError("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (2, 0))


Is this error caused by the different spark versions?

Best,

Klaus




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-to-remote-cluster-tp28147.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Can I have two different receivers for my Spark client program?

2016-11-30 Thread kant kodali
HI All,

I am wondering if it makes sense to have two receivers inside my Spark
Client program?

The use case is as follows.

1) We have to support a feed from Kafka so this will be a direct receiver
#1. We need to perform batch inserts from kafka feed to Cassandra.

2) an gRPC receiver where we get a RPC request or HTTP request (since grpc
uses HTTP2) to retrieve a record  or perform a table scan or select a good
chunk of records from Cassandra. so this is more like a microservice with
spark streaming layer underneath where some of the requests such as batch
reads and batch inserts will go through spark.

Any feedback or thoughts will be great!

Thanks!


Unsubscribe

2016-11-30 Thread Aditya

Unsubscribe




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Logistic regression using gradient ascent

2016-11-30 Thread Meeraj Kunnumpurath
Hello,

I have been trying to implement logistic regression using gradient ascent,
out of curiosity. I am using Spark ML feature extraction packages and data
frames, and not any of the implemented algorithms. I will be grateful if
any of you could please cast an eye and provide some feedback.

https://github.com/kunnum/sandbox/blob/master/classification/src/main/scala/com/ss/ml/classification/lr/LRWithGradientAscent.scala

Regards

-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169mee...@servicesymphony.com *