Re: Regarding Remote ES Cluster with Pio

2016-11-19 Thread Hasan Can Saral
Hi!

There might be an issue with basic auth. I have not tried to configure pio with 
an ES server with basic auth. And from the error you get, I understand that pio 
does not seem to be happy with (or even find) the hosts you provided. Also what 
port is your ES cluster listening to? Can you try 9300 and 9200 explicitly?


> On Nov 17, 2016, at 5:26 PM, Harsh Mathur  wrote:
> 
> Hi PredictionIO developers,
> First of all Thank you for a great open source product.
> 
> I am Harsh, I was deploying the system in production and I have an ES 
> instance as a managed service. I am not able to make pio use my managed es 
> instance instead of me installing a local es. Thanks a lot for all the help 
> in advance.
> 
> I have a ES config in form: https://user:password@host
> ports available:
> 1. x: for http
> 2. y: for native java node clients
> 
> I tried editing pio-env.sh as follows:
> 
> # Elasticsearch Example
> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=https://user:password@host
> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=native_java_port
> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.5.2
> 
> 
> But Pio is not bale to find any nodes:
> 
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> 
> [WARN] [netty] [Aftershock] exception caught on transport layer [[id: 
> 0x63808344]], closing connection
> 
> [ERROR] [Console$] Unable to connect to all storage backends successfully. 
> The following shows the error message from the storage backend.
> 
> [ERROR] [Console$] None of the configured nodes are available: [] 
> (org.elasticsearch.client.transport.NoNodeAvailableException)
> 
> [ERROR] [Console$] Dumping configuration of initialized storage backend 
> sources. Please make sure they are correct.
> 
> Regards
> Harsh Mathur
> harshmathur.1...@gmail.com 
> 
> “Perseverance is the hard work you do after you get tired of doing the hard 
> work you already did."
> 



Re: How to access Spark Context in predict?

2016-09-27 Thread Hasan Can Saral
Hi Kenneth & Donald,

That was really clarifying, thank you. I really appreciate it. So now I
know that;

1- I should use LEventStore and query HBase without sc and with the
smallest processing as possible in predict,
2- In this case I don't have to extend PersistentModel, since I will not
need sc in predict.
3- If I need sc and batch processing in predict, I can save RandomForest
trees to a file, then I can load it from there. As far as I can see, mt
only option to access sc for PEventStore is to add a dummy RDD to the
model, and use dummyRDD.context.

Am I correct, especially in the 3rd point?

Thank you again,
Hasan

On Tue, Sep 27, 2016 at 9:00 AM, Kenneth Chan <kenn...@apache.org> wrote:

> Hasan,
>
> Spark randomforest algo doesn't need RDD. much simpler to simply serialize
> it and use in local memory in predict().
> see example here.
> https://github.com/PredictionIO/template-scala-parallel-leadscoring/blob/
> develop/src/main/scala/RFAlgorithm.scala
>
> For accessing evernt store in predict(), you should use LEventStore API
> (not PEventStore API) to have fast query for specific events.
>
> (use PEventStore API if you really want to do batch processing again in
> predict() and need RDD for it)
>
>
> Kenneth
>
>
> On Mon, Sep 26, 2016 at 9:19 PM, Donald Szeto <don...@apache.org> wrote:
>
>> Hi Hasan,
>>
>> Does your randomForestModel contain any RDD?
>>
>> If so, implement your algorithm by extending PAlgorithm, have your model
>> extend PersistentModel, and implement PersistentModelLoader to save and
>> load your model. You will be able to perform RDD operations within
>> predict() by using the model's RDD.
>>
>> If not, implement your algorithm by extending P2LAlgorithm, and see if
>> PredictionIO can automatically persist the model for you. The convention
>> assumes that a non-RDD model does not require Spark to perform any RDD
>> operations, so there will be no SparkContext access.
>>
>> Are these conventions not fitting your use case? Feedbacks are always
>> welcome for improving PredictionIO.
>>
>> Regards,
>> Donald
>>
>>
>> On Mon, Sep 26, 2016 at 9:05 AM, Hasan Can Saral <hasancansa...@gmail.com
>> > wrote:
>>
>>> Hi Marcin,
>>>
>>> I did look at the definition of PersistentModel, and indeed replaced
>>> LocalFileSystemPersistentModel with PersistenModel. Thank you for this, I
>>> really appreciate your help.
>>>
>>> However, I am having quite hard time understanding how I can access sc
>>> object that is provided by PredictionIO to save and apply methods within
>>> predict method.
>>>
>>> class SomeModel(randomForestModel: RandomForestModel,dummyRDD: RDD) extends 
>>> PersistentModel[SomeAlgorithmParams] {
>>>
>>>   override def save(id: String, params: SomeAlgorithmParams, sc: 
>>> SparkContext): Boolean = {
>>>
>>> // Here I should save randomForestModel to a file, but how to?// Tried 
>>> saveAsObjectFile but no luck.
>>>
>>> true
>>>
>>>   }
>>> }
>>>
>>> object SomeModel extends PersistentModelLoader[SomeAlgorithmParams, 
>>> FraudModel] {
>>>   override def apply(id: String, params: SomeAlgorithmParams, sc: 
>>> Option[SparkContext]): SomeModel = {
>>>
>>> // // Here should I load randomForestModel from file? How?
>>> new SomeModel(randomForestModel)
>>>
>>>   }
>>> }
>>>
>>> So, my questions have become:
>>> 1- Can I save randomForestModel? If yes, how? If I cannot, I will have
>>> to return false and retrain upon deployment. How do I skip pio train in
>>> this case?
>>> 2- How do I load saved randomForestModel from file? If I cannot, will I
>>> remove object SomeModel extends PersistentModelLoader all together?
>>> 3- How do I access sc within predict? Do I save a dummy RDD, load it in
>>> apply, and say .context? In this case what happens to randomForestModel?
>>>
>>> I am really quite confused and could really appreciate some help/sample
>>> code if you have time.
>>> Thank you.
>>> Hasan
>>>
>>>
>>> On Mon, Sep 26, 2016 at 2:56 PM, Marcin Ziemiński <ziem...@gmail.com>
>>> wrote:
>>>
>>>> Hi Hasan,
>>>>
>>>> So I guess, there are two things here:
>>>> 1. You need SparkContext for predictions
>>>> 2. You also need to retrain you model during loading
>>>>
>>>> Please, look at the definition of Per

Re: How to access Spark Context in predict?

2016-09-26 Thread Hasan Can Saral
Hi Marcin,

I did look at the definition of PersistentModel, and indeed replaced
LocalFileSystemPersistentModel with PersistenModel. Thank you for this, I
really appreciate your help.

However, I am having quite hard time understanding how I can access sc
object that is provided by PredictionIO to save and apply methods within
predict method.

class SomeModel(randomForestModel: RandomForestModel,dummyRDD: RDD)
extends PersistentModel[SomeAlgorithmParams] {

  override def save(id: String, params: SomeAlgorithmParams, sc:
SparkContext): Boolean = {

// Here I should save randomForestModel to a file, but how to?// Tried
saveAsObjectFile but no luck.

true

  }
}

object SomeModel extends PersistentModelLoader[SomeAlgorithmParams,
FraudModel] {
  override def apply(id: String, params: SomeAlgorithmParams, sc:
Option[SparkContext]): SomeModel = {

// // Here should I load randomForestModel from file? How?
new SomeModel(randomForestModel)

  }
}

So, my questions have become:
1- Can I save randomForestModel? If yes, how? If I cannot, I will have to
return false and retrain upon deployment. How do I skip pio train in this
case?
2- How do I load saved randomForestModel from file? If I cannot, will I
remove object SomeModel extends PersistentModelLoader all together?
3- How do I access sc within predict? Do I save a dummy RDD, load it in
apply, and say .context? In this case what happens to randomForestModel?

I am really quite confused and could really appreciate some help/sample
code if you have time.
Thank you.
Hasan


On Mon, Sep 26, 2016 at 2:56 PM, Marcin Ziemiński <ziem...@gmail.com> wrote:

> Hi Hasan,
>
> So I guess, there are two things here:
> 1. You need SparkContext for predictions
> 2. You also need to retrain you model during loading
>
> Please, look at the definition of PersistentModel and the comments
> attached:
>
> trait PersistentModel[AP <: Params] {/** Save the model to some persistent 
> storage.
> *
> * This method should return true if the model has been saved successfully so
> * that PredictionIO knows that it can be restored later during deployment.
> * This method should return false if the model cannot be saved (or should
> * not be saved due to configuration) so that PredictionIO will re-train the
> * model during deployment. All arguments of this method are provided by
> * automatically by PredictionIO.
> *
> * @param id ID of the run that trained this model.
> * @param params Algorithm parameters that were used to train this model.
> * @param sc An Apache Spark context.
> */def save(id: String, params: AP, sc: SparkContext): Boolean}
>
> In order to achieve the desired result you could simply use
> PersistentModel instead of LocalFileSystemPersistentModel and return false
> from save. Then during deployment your model will be retrained through your
> Algorithm implementation. You shouldn't need to retrain your model in
> implementations of PersistentModelLoader - this is rather for loading
> models, that are already trained and stored somewhere.
> You can save SparkContext instance provided to the train method for usage
> in predict(...) (assuming that your algorithm is an instance of PAlgorithm
> or P2LAlgorithm). Thus you should have what you need.
>
> Regards,
> Marcin
>
>
>
> pt., 23.09.2016 o 17:46 użytkownik Hasan Can Saral <
> hasancansa...@gmail.com> napisał:
>
>> Hi Marcin!
>>
>> Thank you for your answer.
>>
>> I do only need SparkContext, but have no idea on:
>> 1- How to retrieve it from PersitentModelLoader?
>> 2- How do I access sc in predict method using the configuration below?
>>
>> class SomeModel() extends 
>> LocalFileSystemPersistentModel[SomeAlgorithmParams] {
>>   override def save(id: String, params: SomeAlgorithmParams, sc: 
>> SparkContext): Boolean = {
>> false
>>   }
>> }
>>
>> object SomeModel extends 
>> LocalFileSystemPersistentModelLoader[SomeAlgorithmParams, FraudModel] {
>>   override def apply(id: String, params: SomeAlgorithmParams, sc: 
>> Option[SparkContext]): SomeModel = {
>> new SomeModel() // HERE I TRAIN AND RETURN THE TRAINED MODEL
>>   }
>> }
>>
>> Thank you very much, I really appreciate it!
>>
>> Hasan
>>
>>
>> On Thu, Sep 22, 2016 at 7:05 PM, Marcin Ziemiński <ziem...@gmail.com>
>> wrote:
>>
>>> Hi Hasan,
>>>
>>> I think that you problem comes from using deserialized RDD, which
>>> already lost its connection with SparkContext.
>>> Similar case could be found here: http://stackoverflow.com/
>>> questions/29567247/serializing-rdd
>>>
>>> If you only really need SparkContext you could probably use the one
>>>