Re: Why does CacheBasedDataSet destroy the cache it is given

2020-11-04 Thread zaleslaw
Dear Courtney Robinson, 

please write if you have any cases to update helper cache with paritions to
better understand the situation. 

How and when are you going to clear this helper caches if the alternative
version (as you suggested in the first email) of CacheBased Dataset will be
provided?

Also, could you please answer to akornesh about main data cache behaviour?
Is it really destroyed in your tests or not now?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Why does CacheBasedDataSet destroy the cache it is given

2020-08-21 Thread akorensh
Courtney,

 The CacheBasedDataset.close() method below only destroys the helper cache
derived from 
  the original data, and used to train the model. It does not touch the
original data set.


@Override public void close() {
datasetCache.destroy(); // destroy the helper cache derived from the
original cache
ComputeUtils.removeData(ignite, datasetId); // remove helper data stored
locally on a node.
ComputeUtils.removeLearningEnv(ignite, datasetId); //remove helper
object used to make the model.
}

see:
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/dataset/CacheBasedDatasetExample.java

If you follow the above example, remove the persons.destroy() statement,
remove Ignite from the auto close block,  run it, and connect via web
console, you would see that the original persons data set remains intact.

If for some reason you do need the helper cache that was created to train
the model then do as follows: 1. create your own: MyCacheBasedDataSet
extends CacheBaseDataSet 
2. override the close() method. This is not recommended for prod, but could
be useful for debugging the models. 


Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Why does CacheBasedDataSet destroy the cache it is given

2020-08-19 Thread Courtney Robinson
Hey,
Just seen this reply.
We have Ignite persistence enabled. The caches/tables are the primary
source of the data. That's the use case.
If we build an ML model from the data in a cache, Ignite's behaviour of
deleting the cache means we'll have lost that data.
We were just lucky this showed up in tests before it got anywhere near
production data.

In our case, we're push data into a cache continually and rebuilding the
model periodically.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) 


https://hypi.io


On Mon, Aug 3, 2020 at 5:28 PM zaleslaw  wrote:

> Dear Courtney Robinson, let's discuss here the possible behaviour of this
> CacheBased Dataset closing.
>
> When designed this feature we think, that the all training parts and stuff
> should be deleted from Caches ad model should be serialized or exported
> somwhere.
>
> What is your use-case& Could you share some code or pseudo-code?
> How are you going to handle data after training?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Why does CacheBasedDataSet destroy the cache it is given

2020-08-03 Thread zaleslaw
Dear Courtney Robinson, let's discuss here the possible behaviour of this
CacheBased Dataset closing.

When designed this feature we think, that the all training parts and stuff
should be deleted from Caches ad model should be serialized or exported
somwhere.

What is your use-case& Could you share some code or pseudo-code?
How are you going to handle data after training?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Why does CacheBasedDataSet destroy the cache it is given

2020-05-27 Thread akorensh
Hi,
  This is the way CacheBasedDataset has been designed. 
   It has been made w/an eye toward training the implemented ML models:
https://apacheignite.readme.io/docs/model-updating

  You are free to create an implementation to fit your needs.
  Use these examples to test your design:
 
https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/dataset


Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Why does CacheBasedDataSet destroy the cache it is given

2020-05-27 Thread Courtney Robinson
Hi all,

The current CacheBasedDataSet destroys the cache and all data along with
it...there is no option to turn this off either.

https://github.com/apache/ignite/blob/master/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/CacheBasedDataset.java#L189

/** {@inheritDoc} */
@Override public void close() {
datasetCache.destroy();
ComputeUtils.removeData(ignite, datasetId);
ComputeUtils.removeLearningEnv(ignite, datasetId);
}


Why does it do this?
It means that using SqlDatasetBuilder will result in the data being deleted
after training a model.
We had to work around this with

var datasetBuilder = new SqlDatasetBuilder(repo.getCtx().getIgnite(),
cacheName, (k, v) -> {
  //*...*
});
var wrapper = new DatasetBuilder() {
  @Override
  public  Dataset build(LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder partCtxBuilder,
PartitionDataBuilder partDataBuilder,
LearningEnvironment localLearningEnv) {
var cbd = datasetBuilder.build(envBuilder, partCtxBuilder,
partDataBuilder, localLearningEnv);
return new DatasetWrapper(cbd) {
  @Override public void close() {
System.out.println("Dataset closed");
//DO NOT call close. Cache based data set deletes the data in
the cache like some mad man!
  }
};
  }

  @Override
  public DatasetBuilder
withUpstreamTransformer(UpstreamTransformerBuilder builder) {
return datasetBuilder.withUpstreamTransformer(builder);
  }

  @Override
  public DatasetBuilder
withFilter(IgniteBiPredicate filterToAdd) {
return datasetBuilder.withFilter(filterToAdd);
  }
};

which works but seems very hacky.
Are we misusing the API somehow - examples/docs do not mention or indicate
anything about this as far as I've found.

Regards,
Courtney Robinson
Founder and CEO, Hypi
https://hypi.io