[jira] [Commented] (IGNITE-20216) Moving ML module to ignite-extensions

2023-09-11 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763612#comment-17763612
 ] 

Alexey Zinoviev commented on IGNITE-20216:
--

Everything is fine, could be merged

> Moving ML module to ignite-extensions
> -
>
> Key: IGNITE-20216
> URL: https://issues.apache.org/jira/browse/IGNITE-20216
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Daschinsky
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: ise
> Fix For: 2.16
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It is time to move this module to ignite extensions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20216) Moving ML module to ignite-extensions

2023-08-16 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754952#comment-17754952
 ] 

Alexey Zinoviev commented on IGNITE-20216:
--

Hi, as PMC and maintainer of this module
 
-1 for removal
+1 for moving to an extension, if it is compatible with the Ignite and could be 
compiled separately from other extension modules
 
Some facts:
 * nobody updates it for latest 3 years—it's true
 * classic ML algorithms are not changed in the latest 3 years (we have not 
supported DL as a part of the module, it's not a goal, Random Forest was not 
changed latest 20 years) as a CSV parsing or JDBC 
 * Tensorflow integration was removed 3 years ago
 * some people contacted me a few weeks ago to fix or develop some features in 
the Ignite ML urgent, but I have no time to do it urgent
 * I met some companies who used IgniteML in 2021 and 2022 including my job 
interview:)
 * I agree with the blas issue, great if somebody could update it, again I 
could help with testing

 
I could help with the review of the PR on the github with moving to an 
extension, please assign on me @zaleslaw, but now I am on vacation, could do it 
in September

> Moving ML module to ignite-extensions
> -
>
> Key: IGNITE-20216
> URL: https://issues.apache.org/jira/browse/IGNITE-20216
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Daschinsky
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: ise
> Fix For: 2.16
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It is time to move this module to ignite extensions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-13803) Scalar test failed due to incorrect Jackson dependency

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev resolved IGNITE-13803.
--
Resolution: Fixed

> Scalar test failed due to incorrect Jackson dependency
> --
>
> Key: IGNITE-13803
> URL: https://issues.apache.org/jira/browse/IGNITE-13803
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.10
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It's failed with
> ```
> java.lang.ExceptionInInitializerError
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
> Jackson version: 2.10.3```
>  
>  
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_ScalaExamples?branch=%3Cdefault%3E&buildTypeTab=overview&mode=builds#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13803) Scalar test failed due to incorrect Jackson dependency

2020-12-02 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242512#comment-17242512
 ] 

Alexey Zinoviev commented on IGNITE-13803:
--

After excluding dependency in "example" POM from ignite-ml RDD tests, scalar 
suite, ML project and all examples are executed without errors

> Scalar test failed due to incorrect Jackson dependency
> --
>
> Key: IGNITE-13803
> URL: https://issues.apache.org/jira/browse/IGNITE-13803
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.10
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's failed with
> ```
> java.lang.ExceptionInInitializerError
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
> Jackson version: 2.10.3```
>  
>  
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_ScalaExamples?branch=%3Cdefault%3E&buildTypeTab=overview&mode=builds#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13803) Scalar test failed due to incorrect Jackson dependency

2020-12-02 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13803:


 Summary: Scalar test failed due to incorrect Jackson dependency
 Key: IGNITE-13803
 URL: https://issues.apache.org/jira/browse/IGNITE-13803
 Project: Ignite
  Issue Type: Bug
  Components: ml
Affects Versions: 2.10
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev
 Fix For: 2.10


It's failed with

```

java.lang.ExceptionInInitializerError
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible 
Jackson version: 2.10.3```

 

 

https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_ScalaExamples?branch=%3Cdefault%3E&buildTypeTab=overview&mode=builds#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-12337) [ML] Redesign the package structure

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev resolved IGNITE-12337.
--
Resolution: Won't Fix

> [ML] Redesign the package structure
> ---
>
> Key: IGNITE-12337
> URL: https://issues.apache.org/jira/browse/IGNITE-12337
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.10
>
>
> The problem is the next: a lot of classes and algorithms are located in not 
> the appropriate places and are not grouped in the high-level packages 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12337) [ML] Redesign the package structure

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12337:
-
Fix Version/s: (was: 2.10)

> [ML] Redesign the package structure
> ---
>
> Key: IGNITE-12337
> URL: https://issues.apache.org/jira/browse/IGNITE-12337
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> The problem is the next: a lot of classes and algorithms are located in not 
> the appropriate places and are not grouped in the high-level packages 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12288) [ML] Replace assert logic with exceptions

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12288:
-
Fix Version/s: (was: 2.10)

> [ML] Replace assert logic with exceptions
> -
>
> Key: IGNITE-12288
> URL: https://issues.apache.org/jira/browse/IGNITE-12288
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> 1) Add exceptions instead of assert logic
> 2) Add tests for the proposed exceptions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12079:
-
Fix Version/s: (was: 2.10)

> [ML][Umbrella] Add advanced preprocessing techniques
> 
>
> Key: IGNITE-12079
> URL: https://issues.apache.org/jira/browse/IGNITE-12079
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> *Main goal:*
> To reduce the gap between Apache Spark and Apache Ignite in preprocessing 
> operations. The reducing of the gap could help with loading Spark ML 
> Pipelines to Ignite ML.
>  
> Next steps:
>  # Add Frequency Encoder
>  # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, 
> LEAST_FREQUENT)
>  # Add RobustScaler (will be added in Spark 3.0)
>  # Add CountVectorizer
>  # Add FeatureHasher
>  # Add QuantileDiscretizer
>  # Add Locality Sensitive Hashing (LSH)
>  # Add LabelEncoder
>  # Add RevertStringIndexing
>  # Add multi-column preprocessor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10426) [ML] Spread parameter isKeepRawLabels across all models

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10426:
-
Fix Version/s: (was: 2.10)

> [ML] Spread parameter isKeepRawLabels across all models
> ---
>
> Key: IGNITE-10426
> URL: https://issues.apache.org/jira/browse/IGNITE-10426
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> Currently, a few models has the parameter isKeepRawLabels and threshold to 
> change predicted value to one of class labels 1 or 0.
> Discuss this in dev-list and think how to solve this task to optimize 
> MultiClassModel
> Possible solution:
>  * add these methods to common model
>  * add this method to MultiClassModel and use reflection to check this 
> parameter in apply method for example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10870:
-
Fix Version/s: (was: 2.10)

> [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
> -
>
> Key: IGNITE-10870
> URL: https://issues.apache.org/jira/browse/IGNITE-10870
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>  Labels: newbie
>
> Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13672) [ML] Add initial JSON export/import support for all models

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13672:
-
Labels: important  (was: )

> [ML] Add initial JSON export/import support for all models
> --
>
> Key: IGNITE-13672
> URL: https://issues.apache.org/jira/browse/IGNITE-13672
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: important
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This approaches uses JAXB project abilities to import in and export from 
> human-readable JSON format.
> Should include:
>  * Basic interfaces
>  * Implementations for all models
>  * Examples for all models (maybe export only)
>  * Tests with import/export to the temp directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Fix Version/s: (was: 2.10)

> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
>  
> We need to be able to export/import Ignite model versions across clusters 
> with different versions and have exchangable & human-readable format for 
> inference with different systems like scikit-learn, Spark ML and etc
> The PMML format is a good choice here: 
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
>  (i) [https://github.com/jpmml/jpmml-model]
>  
> But PMML has limitation support for Ensembles like Random Forest, Gradient 
> Boosted Trees, Stacking, Bagging and so on.
> These cases could be covered with our own JSON format which could be easily 
> parsed in another system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-12-02 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Labels:   (was: important)

> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
>  
> We need to be able to export/import Ignite model versions across clusters 
> with different versions and have exchangable & human-readable format for 
> inference with different systems like scikit-learn, Spark ML and etc
> The PMML format is a good choice here: 
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
>  (i) [https://github.com/jpmml/jpmml-model]
>  
> But PMML has limitation support for Ensembles like Random Forest, Gradient 
> Boosted Trees, Stacking, Bagging and so on.
> These cases could be covered with our own JSON format which could be easily 
> parsed in another system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13672) [ML] Add initial JSON export/import support for all models

2020-11-04 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13672:


 Summary: [ML] Add initial JSON export/import support for all models
 Key: IGNITE-13672
 URL: https://issues.apache.org/jira/browse/IGNITE-13672
 Project: Ignite
  Issue Type: Sub-task
  Components: ml
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev
 Fix For: 2.10


This approaches uses JAXB project abilities to import in and export from 
human-readable JSON format.

Should include:
 * Basic interfaces
 * Implementations for all models
 * Examples for all models (maybe export only)
 * Tests with import/export to the temp directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13533) [ML] Tutorial examples runs more than 300000ms

2020-10-06 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13533:
-
  Component/s: ml
Fix Version/s: 2.10

> [ML] Tutorial examples runs more than 30ms
> --
>
> Key: IGNITE-13533
> URL: https://issues.apache.org/jira/browse/IGNITE-13533
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
>   Test has been timed out [test=testExample, timeout=30]
> Seems like we have a race condition in Genetic Parallel Hyper-parameter tuning
>  
> {code:java}
> [12:22:10] :  [Step 4/5] Thread 
> [name="test-runner-#31311%ml.TutorialStepByStepExampleSelfTest%", id=32007, 
> state=RUNNABLE, blockCnt=1982, waitCnt=91727][12:22:10] :  [Step 4/5] Thread 
> [name="test-runner-#31311%ml.TutorialStepByStepExampleSelfTest%", id=32007, 
> state=RUNNABLE, blockCnt=1982, waitCnt=91727][12:22:10] :  [Step 4/5]         
> at java.lang.System.identityHashCode(Native Method)[12:22:10] :  [Step 4/5]   
>       at 
> java.io.ObjectOutputStream$HandleTable.hash(ObjectOutputStream.java:2360)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream$HandleTable.lookup(ObjectOutputStream.java:2293)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream$ReplaceTable.lookup(ObjectOutputStream.java:2399)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1113)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)[12:22:10]
>  :  [Step 4/5]         at 
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)[12:22:10] 
> :  [Step 4/5]         at 
> o.a.i.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:97)[12:22:10] 
> :  [Step 4/5]         at 
> o.a.i.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109)[12:22:10] 
> :  [Step 4/5]         at 
> o.a.i.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.util.IgniteUtils.marshal(IgniteUtils.java:10505)[12:22:10] :  [Step 
> 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor$7.applyx(GridCacheProcessor.java:4952)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor$7.applyx(GridCacheProcessor.java:4933)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.withBinaryContext(GridCacheProcessor.java:4978)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.cloneCheckSerializable(GridCacheProcessor.java:4933)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.prepareCacheChangeRequest(GridCacheProcessor.java:5036)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.lambda$dynamicStartCache$26(GridCacheProcessor.java:3472)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$722/1638695311.apply(Unknown
>  Source)[12:22:10] :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.dynamicStartCache(GridCacheProcessor.java:3503)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.dynamicStartCache(GridCacheProcessor.java:3408)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.i.IgniteKernal.createCache(IgniteKernal.java:3191)[12:22:10] :  [Step 
> 4/5]         at 
> o.a.i.ml.dataset.impl.cache.CacheBasedDatasetBuilder.build(CacheBasedDatasetBuilder.java:151)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.dataset.impl.cache.CacheBasedDatasetBuilder.build(CacheBasedDatasetBuilder.java:43)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.selection.scoring.evaluator.Evaluator.evaluate(Evaluator.java:429)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.selection.cv.AbstractCrossValidation.score(AbstractCrossValidation.java:350)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.selection.cv.CrossValidation.scoreOnIgnite(CrossValidation.java:79)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.selection.cv.CrossValidation.scoreByFolds(CrossValidation.java:53)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.selection.cv.AbstractCrossValidation.calculateScoresForFixedParamSet(AbstractCrossValidation.java:294)[12:22:10]
>  :  [Step 4/5]         at 
> o.a.i.ml.selection.c

[jira] [Updated] (IGNITE-13533) [ML] Tutorial examples runs more than 300000ms

2020-10-06 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13533:
-
Description: 
  Test has been timed out [test=testExample, timeout=30]

Seems like we have a race condition in Genetic Parallel Hyper-parameter tuning

 
{code:java}
[12:22:10] :  [Step 4/5] Thread 
[name="test-runner-#31311%ml.TutorialStepByStepExampleSelfTest%", id=32007, 
state=RUNNABLE, blockCnt=1982, waitCnt=91727][12:22:10] :  [Step 4/5] Thread 
[name="test-runner-#31311%ml.TutorialStepByStepExampleSelfTest%", id=32007, 
state=RUNNABLE, blockCnt=1982, waitCnt=91727][12:22:10] :  [Step 4/5]         
at java.lang.System.identityHashCode(Native Method)[12:22:10] :  [Step 4/5]     
    at 
java.io.ObjectOutputStream$HandleTable.hash(ObjectOutputStream.java:2360)[12:22:10]
 :  [Step 4/5]         at 
java.io.ObjectOutputStream$HandleTable.lookup(ObjectOutputStream.java:2293)[12:22:10]
 :  [Step 4/5]         at 
java.io.ObjectOutputStream$ReplaceTable.lookup(ObjectOutputStream.java:2399)[12:22:10]
 :  [Step 4/5]         at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1113)[12:22:10] 
:  [Step 4/5]         at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)[12:22:10]
 :  [Step 4/5]         at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)[12:22:10]
 :  [Step 4/5]         at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)[12:22:10]
 :  [Step 4/5]         at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)[12:22:10] 
:  [Step 4/5]         at 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)[12:22:10] : 
 [Step 4/5]         at 
o.a.i.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:97)[12:22:10] :  
[Step 4/5]         at 
o.a.i.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109)[12:22:10] : 
 [Step 4/5]         at 
o.a.i.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.util.IgniteUtils.marshal(IgniteUtils.java:10505)[12:22:10] :  [Step 
4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor$7.applyx(GridCacheProcessor.java:4952)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor$7.applyx(GridCacheProcessor.java:4933)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor.withBinaryContext(GridCacheProcessor.java:4978)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor.cloneCheckSerializable(GridCacheProcessor.java:4933)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor.prepareCacheChangeRequest(GridCacheProcessor.java:5036)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor.lambda$dynamicStartCache$26(GridCacheProcessor.java:3472)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$722/1638695311.apply(Unknown
 Source)[12:22:10] :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor.dynamicStartCache(GridCacheProcessor.java:3503)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.processors.cache.GridCacheProcessor.dynamicStartCache(GridCacheProcessor.java:3408)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.i.IgniteKernal.createCache(IgniteKernal.java:3191)[12:22:10] :  [Step 
4/5]         at 
o.a.i.ml.dataset.impl.cache.CacheBasedDatasetBuilder.build(CacheBasedDatasetBuilder.java:151)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.dataset.impl.cache.CacheBasedDatasetBuilder.build(CacheBasedDatasetBuilder.java:43)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.scoring.evaluator.Evaluator.evaluate(Evaluator.java:429)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.cv.AbstractCrossValidation.score(AbstractCrossValidation.java:350)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.cv.CrossValidation.scoreOnIgnite(CrossValidation.java:79)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.cv.CrossValidation.scoreByFolds(CrossValidation.java:53)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.cv.AbstractCrossValidation.calculateScoresForFixedParamSet(AbstractCrossValidation.java:294)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.cv.AbstractCrossValidation.lambda$scoreEvolutionAlgorithmSearchHyperparameterOptimization$0(AbstractCrossValidation.java:142)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.selection.cv.AbstractCrossValidation$$Lambda$1731/1372309000.apply(Unknown
 Source)[12:22:10] :  [Step 4/5]         at 
o.a.i.ml.util.genetic.Population.calculateFitnessForChromosome(Population.java:58)[12:22:10]
 :  [Step 4/5]         at 
o.a.i.ml.util.genetic.GeneticAlgorithm.run(GeneticAlgorithm.java:118)[12:22:10] 
:  [Step 4/5]         at 
o.a.i.ml.selection.cv.AbstractCrossValidation.scoreEvolutionAlgorithmSearchHyperparam

[jira] [Commented] (IGNITE-13532) [ML] Test DatasetAffinityFunctionWrapperTest failed with UnnecessaryStubbingException

2020-10-06 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208647#comment-17208647
 ] 

Alexey Zinoviev commented on IGNITE-13532:
--

I've run ML visa due to changes related only to ML module

Currently TC Bot run on master with hundrends of broken tests and missed 
licenses

> [ML] Test DatasetAffinityFunctionWrapperTest failed with 
> UnnecessaryStubbingException
> -
>
> Key: IGNITE-13532
> URL: https://issues.apache.org/jira/browse/IGNITE-13532
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> NOTE: This is not reproduced locally, but reproduced on TC
>  
> org.mockito.exceptions.misusing.UnnecessaryStubbingException: Unnecessary 
> stubbings detected in test class: DatasetAffinityFunctionWrapperTest Clean & 
> maintainable test code requires zero unnecessary code. Following stubbings 
> are unnecessary (click to navigate to relevant line of code): 1. -> at 
> org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
>  Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
> javadoc for UnnecessaryStubbingException class.
>  org.mockito.exceptions.misusing.UnnecessaryStubbingException:
>  Unnecessary stubbings detected in test class: 
> DatasetAffinityFunctionWrapperTest
>  Clean & maintainable test code requires zero unnecessary code.
>  Following stubbings are unnecessary (click to navigate to relevant line of 
> code):
>  1. -> at 
> org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
>  Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
> javadoc for UnnecessaryStubbingException class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13533) [ML] Tutorial examples runs more than 300000ms

2020-10-06 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13533:


 Summary: [ML] Tutorial examples runs more than 30ms
 Key: IGNITE-13533
 URL: https://issues.apache.org/jira/browse/IGNITE-13533
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev


  Test has been timed out [test=testExample, timeout=30]

Seems like we have a race condition in Genetic Parallel Hyper-parameter tuning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13532) [ML] Test DatasetAffinityFunctionWrapperTest failed with UnnecessaryStubbingException

2020-10-06 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13532:


 Summary: [ML] Test DatasetAffinityFunctionWrapperTest failed with 
UnnecessaryStubbingException
 Key: IGNITE-13532
 URL: https://issues.apache.org/jira/browse/IGNITE-13532
 Project: Ignite
  Issue Type: Bug
  Components: ml
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev
 Fix For: 2.10


org.mockito.exceptions.misusing.UnnecessaryStubbingException: Unnecessary 
stubbings detected in test class: DatasetAffinityFunctionWrapperTest Clean & 
maintainable test code requires zero unnecessary code. Following stubbings are 
unnecessary (click to navigate to relevant line of code): 1. -> at 
org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
 Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
javadoc for UnnecessaryStubbingException class.
org.mockito.exceptions.misusing.UnnecessaryStubbingException:
Unnecessary stubbings detected in test class: DatasetAffinityFunctionWrapperTest
Clean & maintainable test code requires zero unnecessary code.
Following stubbings are unnecessary (click to navigate to relevant line of 
code):
1. -> at 
org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
javadoc for UnnecessaryStubbingException class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13532) [ML] Test DatasetAffinityFunctionWrapperTest failed with UnnecessaryStubbingException

2020-10-06 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13532:
-
Description: 
NOTE: This is not reproduced locally, but reproduced on TC

 

org.mockito.exceptions.misusing.UnnecessaryStubbingException: Unnecessary 
stubbings detected in test class: DatasetAffinityFunctionWrapperTest Clean & 
maintainable test code requires zero unnecessary code. Following stubbings are 
unnecessary (click to navigate to relevant line of code): 1. -> at 
org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
 Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
javadoc for UnnecessaryStubbingException class.
 org.mockito.exceptions.misusing.UnnecessaryStubbingException:
 Unnecessary stubbings detected in test class: 
DatasetAffinityFunctionWrapperTest
 Clean & maintainable test code requires zero unnecessary code.
 Following stubbings are unnecessary (click to navigate to relevant line of 
code):
 1. -> at 
org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
 Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
javadoc for UnnecessaryStubbingException class.

  was:
org.mockito.exceptions.misusing.UnnecessaryStubbingException: Unnecessary 
stubbings detected in test class: DatasetAffinityFunctionWrapperTest Clean & 
maintainable test code requires zero unnecessary code. Following stubbings are 
unnecessary (click to navigate to relevant line of code): 1. -> at 
org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
 Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
javadoc for UnnecessaryStubbingException class.
org.mockito.exceptions.misusing.UnnecessaryStubbingException:
Unnecessary stubbings detected in test class: DatasetAffinityFunctionWrapperTest
Clean & maintainable test code requires zero unnecessary code.
Following stubbings are unnecessary (click to navigate to relevant line of 
code):
1. -> at 
org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
javadoc for UnnecessaryStubbingException class.


> [ML] Test DatasetAffinityFunctionWrapperTest failed with 
> UnnecessaryStubbingException
> -
>
> Key: IGNITE-13532
> URL: https://issues.apache.org/jira/browse/IGNITE-13532
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.10
>
>
> NOTE: This is not reproduced locally, but reproduced on TC
>  
> org.mockito.exceptions.misusing.UnnecessaryStubbingException: Unnecessary 
> stubbings detected in test class: DatasetAffinityFunctionWrapperTest Clean & 
> maintainable test code requires zero unnecessary code. Following stubbings 
> are unnecessary (click to navigate to relevant line of code): 1. -> at 
> org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
>  Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
> javadoc for UnnecessaryStubbingException class.
>  org.mockito.exceptions.misusing.UnnecessaryStubbingException:
>  Unnecessary stubbings detected in test class: 
> DatasetAffinityFunctionWrapperTest
>  Clean & maintainable test code requires zero unnecessary code.
>  Following stubbings are unnecessary (click to navigate to relevant line of 
> code):
>  1. -> at 
> org.apache.ignite.ml.dataset.impl.cache.util.DatasetAffinityFunctionWrapperTest.testPartition(DatasetAffinityFunctionWrapperTest.java:80)
>  Please remove unnecessary stubbings or use 'lenient' strictness. More info: 
> javadoc for UnnecessaryStubbingException class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13531) [ML] Code cleanup in Util classes

2020-10-06 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208597#comment-17208597
 ] 

Alexey Zinoviev commented on IGNITE-13531:
--

[~mrkandreev] please, have a look, I've created a ticket for your clean-up 
edits, suggested in doc 
[https://docs.google.com/document/d/1_oBgmNfu6YnuSxEg9e1ImyGSV-fgmHq4Ut-hPq2bakQ/edit?usp=sharing]

except the cloning (need to think how to do it better and maybe make it later 
by myself)

> [ML] Code cleanup in Util classes
> -
>
> Key: IGNITE-13531
> URL: https://issues.apache.org/jira/browse/IGNITE-13531
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> *Suggest improvement to Util classes*
>  
> I suggest to add a final class modifier and to add a private constructor
> to Util classes in ignite ml. This is Sonar rule RSPEC-1118 (
> [https://rules.sonarsource.com/java/tag/design/RSPEC-1118]).
>  
> Motivation:
> Utility classes, which are collections of static members, are not meant to
> be instantiated. Even abstract utility classes, which can be extended,
> should not have public constructors. Java adds an implicit public
> constructor to every class which does not define at least one explicitly.
> Hence, at least one non-public constructor should be defined.
>  
> We can add this to:
>  * DistributedMetaStorageUtil.java
>  * ComputeUtils.java
>  * IgniteModelStorageUtil.java
>  * MapUtil.java
>  * MatrixUtil.java
>  * Utils.java
> Class JdbcThinSSLUtil.java already has a private constructor.
>  
> *Suggest add Serializable to Blas class*
> I found that class Blas (org.apache.ignite.ml.math) is not Serializable but
> fields f2jBlas and nativeBlas are transient. So I suggest adding
> a Serializable to Blas class.
>  
> *Add final modifier to static inner fields in utils class*
> Motivation:
> This static field public but not final, and could be changed by malicious 
> code or by accident from another package. The field could be made final to 
> avoid this vulnerability.
>  
> For example replace:
> public static IgniteDifferentiableDoubleToDoubleFunction SIGMOID = new 
> IgniteDifferentiableDoubleToDoubleFunction() {
> }
> With:
> public static final IgniteDifferentiableDoubleToDoubleFunction SIGMOID = new 
> IgniteDifferentiableDoubleToDoubleFunction() {
> }
>  
> *Inefficient use of keySet iterator instead of entrySet*
> This method accesses the value of a Map entry, using a key that was retrieved 
> from a keySet iterator. It is more efficient to use an iterator on the 
> entrySet of the map, to avoid the Map.get(key) lookup.
>  
> Possible problem is expected order for set.
>  
> For example:
> for (Integer bucket : hist.keySet()) {
> accum += hist.get(bucket);
> res.put(bucket, accum);
> }
>  
> *Can be replaced with single Arrays.fill method call*
> For example:
> for (int i = 0; i < mins.length; i++)
> mins[i] = Double.POSITIVE_INFINITY;
> Can be replaced with:
> Arrays.fill(mins, Double.POSITIVE_INFINITY);
> Founded in:
>  * ImputerTrainer
>  * MaxAbsScalerTrainer
>  * MinMaxScalerTrainer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13531) [ML] Code cleanup in Util classes

2020-10-06 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13531:


 Summary: [ML] Code cleanup in Util classes
 Key: IGNITE-13531
 URL: https://issues.apache.org/jira/browse/IGNITE-13531
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Zinoviev
 Fix For: 2.10


*Suggest improvement to Util classes*

 

I suggest to add a final class modifier and to add a private constructor

to Util classes in ignite ml. This is Sonar rule RSPEC-1118 (

[https://rules.sonarsource.com/java/tag/design/RSPEC-1118]).

 

Motivation:

Utility classes, which are collections of static members, are not meant to

be instantiated. Even abstract utility classes, which can be extended,

should not have public constructors. Java adds an implicit public

constructor to every class which does not define at least one explicitly.

Hence, at least one non-public constructor should be defined.

 

We can add this to:
 * DistributedMetaStorageUtil.java
 * ComputeUtils.java
 * IgniteModelStorageUtil.java
 * MapUtil.java
 * MatrixUtil.java
 * Utils.java

Class JdbcThinSSLUtil.java already has a private constructor.

 

*Suggest add Serializable to Blas class*

I found that class Blas (org.apache.ignite.ml.math) is not Serializable but

fields f2jBlas and nativeBlas are transient. So I suggest adding

a Serializable to Blas class.

 

*Add final modifier to static inner fields in utils class*

Motivation:
This static field public but not final, and could be changed by malicious code 
or by accident from another package. The field could be made final to avoid 
this vulnerability.

 

For example replace:

public static IgniteDifferentiableDoubleToDoubleFunction SIGMOID = new 
IgniteDifferentiableDoubleToDoubleFunction() {

}

With:
public static final IgniteDifferentiableDoubleToDoubleFunction SIGMOID = new 
IgniteDifferentiableDoubleToDoubleFunction() {

}

 

*Inefficient use of keySet iterator instead of entrySet*

This method accesses the value of a Map entry, using a key that was retrieved 
from a keySet iterator. It is more efficient to use an iterator on the entrySet 
of the map, to avoid the Map.get(key) lookup.

 

Possible problem is expected order for set.

 

For example:
for (Integer bucket : hist.keySet()) {

accum += hist.get(bucket);

res.put(bucket, accum);

}

 

*Can be replaced with single Arrays.fill method call*


For example:
for (int i = 0; i < mins.length; i++)

mins[i] = Double.POSITIVE_INFINITY;

Can be replaced with:
Arrays.fill(mins, Double.POSITIVE_INFINITY);

Founded in:
 * ImputerTrainer
 * MaxAbsScalerTrainer
 * MinMaxScalerTrainer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13386) [ML] Add more distances between two Vectors (Part 2)

2020-10-06 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208580#comment-17208580
 ] 

Alexey Zinoviev commented on IGNITE-13386:
--

[~mrkandreev] please move to patch available

> [ML] Add more distances between two Vectors (Part 2)
> 
>
> Key: IGNITE-13386
> URL: https://issues.apache.org/jira/browse/IGNITE-13386
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Mark Andreev
>Priority: Minor
> Fix For: 2.10
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Mark suggested to add more distances, below his letter about topic
> [http://apache-ignite-developers.2346864.n4.nabble.com/First-contribute-to-Ignite-ML-td48950.html]
> "Currently, Ignite supports only these distances
> (org.apache.ignite.ml.math.distances) :
> - ChebyshevDistance
> - CosineSimilarity
> - EuclideanDistance
> - HammingDistance
> - JaccardIndex
> - ManhattanDistance
> - MinkowskiDistance
> But in scipy (
> [https://docs.scipy.org/doc/scipy/reference/spatial.distance.html]) we can
> find at least:
> - BrayCurtis
> - Canberra
> - Jensen-Shannon
> - Seuclidean
> - Weighted Minkowski
> I can implement those and coverage with unit tests."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13392) Incorrect Vector::kNorm evaluation for odd powers

2020-10-06 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208562#comment-17208562
 ] 

Alexey Zinoviev commented on IGNITE-13392:
--

[~mrkandreev] Please move ticket to the patch available status

> Incorrect Vector::kNorm evaluation for odd powers
> -
>
> Key: IGNITE-13392
> URL: https://issues.apache.org/jira/browse/IGNITE-13392
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Mark Andreev
>Assignee: Mark Andreev
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current implementation of `Vector::kNorm` is incorrect. 
> Current formula is 
> (`org.apache.ignite.ml.math.primitives.vector.AbstractVector:882`):
> {code:java}
> (\sum_{i}{x^p})^{1/p}
> {code}
> But correct formula is:
> {code:java}
> (\sum_{i}{|x|^p})^{1/p}
> {code}
> We can verify this using lectures 
> ([https://www.math.usm.edu/lambers/mat610/sum10/lecture2.pdf)] or using 
> Wolfram Mathematica:
> {code:java}
> > Norm[{x, y, z}, p]
> (Abs[x]^p+Abs[y]^p+Abs[z]^p)^(1/p){code}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13392) Incorrect Vector::kNorm evaluation for odd powers

2020-10-06 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13392:
-
Fix Version/s: 2.10

> Incorrect Vector::kNorm evaluation for odd powers
> -
>
> Key: IGNITE-13392
> URL: https://issues.apache.org/jira/browse/IGNITE-13392
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Mark Andreev
>Assignee: Mark Andreev
>Priority: Minor
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current implementation of `Vector::kNorm` is incorrect. 
> Current formula is 
> (`org.apache.ignite.ml.math.primitives.vector.AbstractVector:882`):
> {code:java}
> (\sum_{i}{x^p})^{1/p}
> {code}
> But correct formula is:
> {code:java}
> (\sum_{i}{|x|^p})^{1/p}
> {code}
> We can verify this using lectures 
> ([https://www.math.usm.edu/lambers/mat610/sum10/lecture2.pdf)] or using 
> Wolfram Mathematica:
> {code:java}
> > Norm[{x, y, z}, p]
> (Abs[x]^p+Abs[y]^p+Abs[z]^p)^(1/p){code}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13392) [ML] Incorrect Vector::kNorm evaluation for odd powers

2020-10-06 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13392:
-
Summary: [ML] Incorrect Vector::kNorm evaluation for odd powers  (was: 
Incorrect Vector::kNorm evaluation for odd powers)

> [ML] Incorrect Vector::kNorm evaluation for odd powers
> --
>
> Key: IGNITE-13392
> URL: https://issues.apache.org/jira/browse/IGNITE-13392
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Mark Andreev
>Assignee: Mark Andreev
>Priority: Minor
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current implementation of `Vector::kNorm` is incorrect. 
> Current formula is 
> (`org.apache.ignite.ml.math.primitives.vector.AbstractVector:882`):
> {code:java}
> (\sum_{i}{x^p})^{1/p}
> {code}
> But correct formula is:
> {code:java}
> (\sum_{i}{|x|^p})^{1/p}
> {code}
> We can verify this using lectures 
> ([https://www.math.usm.edu/lambers/mat610/sum10/lecture2.pdf)] or using 
> Wolfram Mathematica:
> {code:java}
> > Norm[{x, y, z}, p]
> (Abs[x]^p+Abs[y]^p+Abs[z]^p)^(1/p){code}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13386) [ML] Add more distances between two Vectors (Part 2)

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13386:
-
Ignite Flags: Docs Required,Release Notes Required  (was: Release Notes 
Required)

> [ML] Add more distances between two Vectors (Part 2)
> 
>
> Key: IGNITE-13386
> URL: https://issues.apache.org/jira/browse/IGNITE-13386
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Mark Andreev
>Priority: Minor
> Fix For: 2.10
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Mark suggested to add more distances, below his letter about topic
> [http://apache-ignite-developers.2346864.n4.nabble.com/First-contribute-to-Ignite-ML-td48950.html]
> "Currently, Ignite supports only these distances
> (org.apache.ignite.ml.math.distances) :
> - ChebyshevDistance
> - CosineSimilarity
> - EuclideanDistance
> - HammingDistance
> - JaccardIndex
> - ManhattanDistance
> - MinkowskiDistance
> But in scipy (
> [https://docs.scipy.org/doc/scipy/reference/spatial.distance.html]) we can
> find at least:
> - BrayCurtis
> - Canberra
> - Jensen-Shannon
> - Seuclidean
> - Weighted Minkowski
> I can implement those and coverage with unit tests."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13386) [ML] Add more distances between two Vectors (Part 2)

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13386:
-
Fix Version/s: 2.10

> [ML] Add more distances between two Vectors (Part 2)
> 
>
> Key: IGNITE-13386
> URL: https://issues.apache.org/jira/browse/IGNITE-13386
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Mark Andreev
>Priority: Minor
> Fix For: 2.10
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Mark suggested to add more distances, below his letter about topic
> [http://apache-ignite-developers.2346864.n4.nabble.com/First-contribute-to-Ignite-ML-td48950.html]
> "Currently, Ignite supports only these distances
> (org.apache.ignite.ml.math.distances) :
> - ChebyshevDistance
> - CosineSimilarity
> - EuclideanDistance
> - HammingDistance
> - JaccardIndex
> - ManhattanDistance
> - MinkowskiDistance
> But in scipy (
> [https://docs.scipy.org/doc/scipy/reference/spatial.distance.html]) we can
> find at least:
> - BrayCurtis
> - Canberra
> - Jensen-Shannon
> - Seuclidean
> - Weighted Minkowski
> I can implement those and coverage with unit tests."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13344) [ML] DummyVectorizer fails to extract label for coordinate with value "0.0" when backed by sparse vector

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-13344:
-
Fix Version/s: 2.10

> [ML] DummyVectorizer fails to extract label for coordinate with value "0.0" 
> when backed by sparse vector
> 
>
> Key: IGNITE-13344
> URL: https://issues.apache.org/jira/browse/IGNITE-13344
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8.1
>Reporter: Thilo-Alexander Ginkel
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.10
>
>
> Given: A labeled DummyVectorizer:
>  
> {code:java}
> new DummyVectorizer()
>  .exclude(excludeCoordinates.stream().map(coord -> vectorLength + 
> coord).toArray(Integer[]::new))
>  .labeled(labelCoord);
> {code}
> {{When extracting the label, the call hierarchy eventually ends up at 
> org.apache.ignite.ml.dataset.feature.extractor.impl.DummyVectorizer#feature, 
> which returns null for val.getRaw when val is a sparse vector with the 
> element at the requested label coordinate being 0.0. This causes the training 
> job to fail (which expects a non-null label):}}
> {code:java}
> org.apache.ignite.IgniteException: Remote job threw user exception (override 
> or implement ComputeTask.result(..) method if you would like to have 
> automatic failover for this exception): 
> nullorg.apache.ignite.IgniteException: Remote job threw user exception 
> (override or implement ComputeTask.result(..) method if you would like to 
> have automatic failover for this exception): null at 
> org.apache.ignite.compute.ComputeTaskAdapter.result(ComputeTaskAdapter.java:102)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1062)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1055)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7037)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.result(GridTaskWorker.java:1055)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:862)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1146)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:961)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:809)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:659)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:519)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
> ~[ignite-core-2.8.1.jar:2.8.1] at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>  ~[na:na] at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>  ~[na:na] at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]Caused 
> by: org.apache.ignite.IgniteException: null at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1858)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:596)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7005)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:590)
>  ~[ignite-core-2.8.1.jar:2.8.1] ... 5 common frames omittedCaused by: 
> java.lang.NullPointerException: null at 
> org.apache.ignite.ml.dataset.impl.bootstrapping.BootstrappedDatasetBuilder.build(BootstrappedDatasetBuilder.java:91)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.bootstrapping.BootstrappedDatasetBuilder.build(BootstrappedDatasetBuilder.java:41)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$getData$4(ComputeUtils.java:239)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.cache.util.PartitionDataStorage.lambda$computeDataIfAbsent$1(PartitionDataStorage.java:56)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.j

[jira] [Updated] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10870:
-
Fix Version/s: (was: 3.0)
   2.10

> [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
> -
>
> Key: IGNITE-10870
> URL: https://issues.apache.org/jira/browse/IGNITE-10870
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 3.0
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.10
>
>
> Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10539) [ML] Make 'with' methods consistent

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10539:
-
Reporter: Alexey Zinoviev  (was: Artem Malykh)

> [ML] Make 'with' methods consistent
> ---
>
> Key: IGNITE-10539
> URL: https://issues.apache.org/jira/browse/IGNITE-10539
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> In some places we have 'with*' methods making inplace changes and returning 
> object itself (for example MLPTrainer::withLoss) while in other places we 
> have them creating new instances with corresponding parameter changed (for 
> example DatasetBuilder::withFilter, 
> DatasetBuilder::withUpstreamTrainsformer). This inconsistency makes user look 
> into javadoc each time and worsens overall API consistensy level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10539) [ML] Make 'with' methods consistent

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10539:
-
Fix Version/s: (was: 2.10)

> [ML] Make 'with' methods consistent
> ---
>
> Key: IGNITE-10539
> URL: https://issues.apache.org/jira/browse/IGNITE-10539
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> In some places we have 'with*' methods making inplace changes and returning 
> object itself (for example MLPTrainer::withLoss) while in other places we 
> have them creating new instances with corresponding parameter changed (for 
> example DatasetBuilder::withFilter, 
> DatasetBuilder::withUpstreamTrainsformer). This inconsistency makes user look 
> into javadoc each time and worsens overall API consistensy level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10869) [ML] Add MultiClass classification metrics

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10869:
-
Affects Version/s: (was: 2.9)

> [ML] Add MultiClass classification metrics
> --
>
> Key: IGNITE-10869
> URL: https://issues.apache.org/jira/browse/IGNITE-10869
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> Add ability to calculate multiple metrics (as binary metrics) for multiclass 
> classification
> It can be merged with OneVsRest approach



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10870:
-
Affects Version/s: (was: 3.0)

> [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
> -
>
> Key: IGNITE-10870
> URL: https://issues.apache.org/jira/browse/IGNITE-10870
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.10
>
>
> Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10869) [ML] Add MultiClass classification metrics

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10869:
-
Fix Version/s: (was: 3.0)

> [ML] Add MultiClass classification metrics
> --
>
> Key: IGNITE-10869
> URL: https://issues.apache.org/jira/browse/IGNITE-10869
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Affects Versions: 2.9
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> Add ability to calculate multiple metrics (as binary metrics) for multiclass 
> classification
> It can be merged with OneVsRest approach



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10870:
-
Labels: newbie  (was: )

> [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
> -
>
> Key: IGNITE-10870
> URL: https://issues.apache.org/jira/browse/IGNITE-10870
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>  Labels: newbie
> Fix For: 2.10
>
>
> Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10503) [ML] Meta information for vectors

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10503:
-
Reporter: Alexey Zinoviev  (was: Alexey Platonov)

> [ML] Meta information for vectors
> -
>
> Key: IGNITE-10503
> URL: https://issues.apache.org/jira/browse/IGNITE-10503
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> We need to design and implement vector meta-information like feature names, 
> bagging information, etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9414) [ML] Using sparce vectors in Tree-based algorithms.

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-9414:

Fix Version/s: (was: 2.10)

> [ML] Using sparce vectors in Tree-based algorithms.
> ---
>
> Key: IGNITE-9414
> URL: https://issues.apache.org/jira/browse/IGNITE-9414
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> We need to support sparce vectors in DecisionTrees, RF, GDB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9415) [ML] Using sparce vectors in LSQR and MLP

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-9415:

Fix Version/s: (was: 2.10)

> [ML] Using sparce vectors in LSQR and MLP
> -
>
> Key: IGNITE-9415
> URL: https://issues.apache.org/jira/browse/IGNITE-9415
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> We need to investigate and apply sparce vectors support in BLAS for LSQR and 
> MLP (or implement own version)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10503) [ML] Meta information for vectors

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10503:
-
Fix Version/s: (was: 2.10)

> [ML] Meta information for vectors
> -
>
> Key: IGNITE-10503
> URL: https://issues.apache.org/jira/browse/IGNITE-10503
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Alexey Zinoviev
>Priority: Major
>
> We need to design and implement vector meta-information like feature names, 
> bagging information, etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12396) [ML] Random Forest generates NaN for a part of models on small datasets

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12396:
-
Affects Version/s: (was: 2.8)

> [ML] Random Forest generates NaN for a part of models on small datasets
> ---
>
> Key: IGNITE-12396
> URL: https://issues.apache.org/jira/browse/IGNITE-12396
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> @Override public Double predict(Vector features) {
>  double[] predictions = new double[models.size()];
>  for (int i = 0; i < models.size(); i++)
>  predictions[i] = models.get(i).predict(features);
>  return predictionsAggregator.apply(predictions);
> }
>  
> predictionAggreagtor gets a lot of models and part of them returns null and 
> it could be aggregated, first of all handle this in Aggregator (using 
> threshold for amount of broken models before aggregation) also RandomForest 
> trees should return Double.NaN - it should fail or throw message after the 
> training
>  
> I've tested with 100 or 1000 rows and it fails and doesn't fail on 10 000 rows
>  
> RF generates a few models with one LEAF node with empty val (Double.NaN by 
> default)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12396) [ML] Random Forest generates NaN for a part of models on small datasets

2020-09-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12396:
-
Fix Version/s: (was: 3.0)
   2.10

> [ML] Random Forest generates NaN for a part of models on small datasets
> ---
>
> Key: IGNITE-12396
> URL: https://issues.apache.org/jira/browse/IGNITE-12396
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> @Override public Double predict(Vector features) {
>  double[] predictions = new double[models.size()];
>  for (int i = 0; i < models.size(); i++)
>  predictions[i] = models.get(i).predict(features);
>  return predictionsAggregator.apply(predictions);
> }
>  
> predictionAggreagtor gets a lot of models and part of them returns null and 
> it could be aggregated, first of all handle this in Aggregator (using 
> threshold for amount of broken models before aggregation) also RandomForest 
> trees should return Double.NaN - it should fail or throw message after the 
> training
>  
> I've tested with 100 or 1000 rows and it fails and doesn't fail on 10 000 rows
>  
> RF generates a few models with one LEAF node with empty val (Double.NaN by 
> default)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13497) [ML] Tutorial examples fails with serialization error

2020-09-29 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13497:


 Summary: [ML] Tutorial examples fails with serialization error
 Key: IGNITE-13497
 URL: https://issues.apache.org/jira/browse/IGNITE-13497
 Project: Ignite
  Issue Type: Bug
  Components: ml
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev
 Fix For: 2.10


Cross-Validation uses in interfaces unserializable functions (DoubleConsumers 
and etc.)

Adds custom serializable functions and double check-up all public interfaces to 
find similar problems

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13386) [ML] Add more distances between two Vectors (Part 2)

2020-08-26 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-13386:


 Summary: [ML] Add more distances between two Vectors (Part 2)
 Key: IGNITE-13386
 URL: https://issues.apache.org/jira/browse/IGNITE-13386
 Project: Ignite
  Issue Type: Sub-task
  Components: ml
Reporter: Alexey Zinoviev
Assignee: Mark Andreev


Mark suggested to add more distances, below his letter about topic

[http://apache-ignite-developers.2346864.n4.nabble.com/First-contribute-to-Ignite-ML-td48950.html]

"Currently, Ignite supports only these distances
(org.apache.ignite.ml.math.distances) :
- ChebyshevDistance
- CosineSimilarity
- EuclideanDistance
- HammingDistance
- JaccardIndex
- ManhattanDistance
- MinkowskiDistance

But in scipy (
[https://docs.scipy.org/doc/scipy/reference/spatial.distance.html]) we can
find at least:
- BrayCurtis
- Canberra
- Jensen-Shannon
- Seuclidean
- Weighted Minkowski

I can implement those and coverage with unit tests."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10592) [ML] DatasetTrainer#update should be thought over.

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10592:
-
Reporter: Alexey Zinoviev  (was: Artem Malykh)

> [ML] DatasetTrainer#update should be thought over.
> --
>
> Key: IGNITE-10592
> URL: https://issues.apache.org/jira/browse/IGNITE-10592
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 3.0
>
>
> DatasetTrainer#update was designed to contain skeleton for updating models, 
> whereas concrete behaviour of update is implemented in subclasses by 
> overriding this skeletons protected components, namely 
> DatasetTrainer#checkState and DatasetTrainer#updateModel.
> We have a problem here: if we retain skeleton method, then it should be 
> final. But making it final will cut the possibility to write wrappers around 
> some given DatasetTrainer, because in that case we will not be able to 
> implement Wrapper#checkState and Wrapper#updateModel by delegation to wrapped 
> object (this methods have protected access). We need wrappers for stacking 
> and for bagging for example.
> Now in wrappers we have ability to
>  1. Override skeleton method, but (maybe) it seems not very clean solution, 
> since it is no more skeleton method and we loose guarantees that checkState 
> and updateModel will be used at all;
>  2. place wrapper in the same package as DatasetTrainer, but this forces 
> not-so-good classes structure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11192) [ML] Use nd4j for matrix inversions and determinants

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-11192:
-
Reporter: Alexey Zinoviev  (was: Alexey Platonov)

> [ML] Use nd4j for matrix inversions and determinants
> 
>
> Key: IGNITE-11192
> URL: https://issues.apache.org/jira/browse/IGNITE-11192
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> From optimization point of view we should use matrix inversions and 
> determinant computations of dl4j instead of own realization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11192) [ML] Use nd4j for matrix inversions and determinants

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-11192:
-
Fix Version/s: (was: 3.0)

> [ML] Use nd4j for matrix inversions and determinants
> 
>
> Key: IGNITE-11192
> URL: https://issues.apache.org/jira/browse/IGNITE-11192
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Alexey Zinoviev
>Priority: Major
>
> From optimization point of view we should use matrix inversions and 
> determinant computations of dl4j instead of own realization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10441) Fluent API refactoring.

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10441:
-
Reporter: Alexey Zinoviev  (was: Artem Malykh)

> Fluent API refactoring.
> ---
>
> Key: IGNITE-10441
> URL: https://issues.apache.org/jira/browse/IGNITE-10441
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> In many classes we have fluent API ("with*" methods). We have following 
> problem: these methods should return exactly instance of it's own class 
> (otherwise we'll have problems with subclasses, more precisely, if with 
> method is declared in class A and we have class B extending A, with method 
> (if we do not override it) will return A). Currently we opted to override 
> "with" methods in subclasses. There is one solution which is probably more 
> elegant, but involves relatively complex generics construction which reduces 
> readability:
>  
> {code:java}
> class A> {
>   Self withX(X x) {
> this.x = x;
>  
> return (Self)this;
>   }
> class B> extends A {
>// No need to override "withX" here
>Self withY(Y y) {
>  this.y = y;
>  
>  return(Self)this;
>}
> }
> class C> extends B {
>// No need to override "withX" and "withY" methods here.
> }
> //... etc
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10746) [ML] Participate in TensorFlow 2.0 preparation

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10746:
-
Fix Version/s: (was: 3.0)

> [ML] Participate in TensorFlow 2.0 preparation
> --
>
> Key: IGNITE-10746
> URL: https://issues.apache.org/jira/browse/IGNITE-10746
> Project: Ignite
>  Issue Type: Task
>  Components: ml, tensorflow
>Affects Versions: 2.7
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>
> The next TensorFlow releases starting from 2.0 introduce significant 
> structure changes: all code from contribution module will be moved into 
> separate sub-projects. Our "TensorFlow on Apache Ignite" integration code in 
> contribution module is also moving into so called "tensorflow/io" sub-project 
> (see [https://github.com/tensorflow/io]).
> Almost all things related to this movement is already done by community 
> members. We need to check that "TensorFlow on Apache Ignite" is still working 
> after the movement, clarify details about "tensorflow/io" 
> review/build/publish procedures including Windows build which is not 
> supported so far.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6707) .NET: Machine learning APIs

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6707:

Fix Version/s: (was: 3.0)

> .NET: Machine learning APIs
> ---
>
> Key: IGNITE-6707
> URL: https://issues.apache.org/jira/browse/IGNITE-6707
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml, platforms
>Reporter: Pavel Tupitsyn
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: .NET
>
> Propagate ML APIs to .NET (see {{modules\ml\}} in Java).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11871) [ML] IP resolver in TensorFlow cluster manager doesn't work properly

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-11871:
-
Fix Version/s: (was: 3.0)

> [ML] IP resolver in TensorFlow cluster manager doesn't work properly
> 
>
> Key: IGNITE-11871
> URL: https://issues.apache.org/jira/browse/IGNITE-11871
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.7, 2.8
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Critical
>
> TensorFlow cluster manager requires NodeId to be resolved into IP address or 
> hostname to pass the address/name to TensorFlow worker. Currently, it uses 
> strategy "return first" and returns the first available address/name. As a 
> result of that, in the case when the server has more than one interface 
> cluster resolver might work incorrectly and return different addresses/names 
> for the same server.
> To fix this problem we need to update 
> [TensorFlowServerAddressSpec|https://github.com/apache/ignite/blob/master/modules/tensorflow/src/main/java/org/apache/ignite/tensorflow/cluster/spec/TensorFlowServerAddressSpec.java]
>  so that it returns the same address/name for the same server all the time. 
> If a server has multiple network interfaces we need to find a "GCD", a 
> network with all Ignite nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11342) [ML] Umbrella: Create a Python API for Ignite ML

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-11342:
-
Priority: Minor  (was: Major)

> [ML] Umbrella: Create a Python API for Ignite ML
> 
>
> Key: IGNITE-11342
> URL: https://issues.apache.org/jira/browse/IGNITE-11342
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 3.0
>
>
> Currently Apache Ignite ML provides only Java API. The most popular language 
> of data analysts is Python. To allow data analysts work with Ignite ML we 
> need to provide Python API.
> The architecture of this Python API should be based on 
> [Py4J|https://www.py4j.org/] library. This library allows to starts a simple 
> server of Java side and then translate all calls from Python API into calls 
> of corresponding Java API and interact with the server via TCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11871) [ML] IP resolver in TensorFlow cluster manager doesn't work properly

2020-08-20 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-11871:
-
Priority: Minor  (was: Critical)

> [ML] IP resolver in TensorFlow cluster manager doesn't work properly
> 
>
> Key: IGNITE-11871
> URL: https://issues.apache.org/jira/browse/IGNITE-11871
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.7, 2.8
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> TensorFlow cluster manager requires NodeId to be resolved into IP address or 
> hostname to pass the address/name to TensorFlow worker. Currently, it uses 
> strategy "return first" and returns the first available address/name. As a 
> result of that, in the case when the server has more than one interface 
> cluster resolver might work incorrectly and return different addresses/names 
> for the same server.
> To fix this problem we need to update 
> [TensorFlowServerAddressSpec|https://github.com/apache/ignite/blob/master/modules/tensorflow/src/main/java/org/apache/ignite/tensorflow/cluster/spec/TensorFlowServerAddressSpec.java]
>  so that it returns the same address/name for the same server all the time. 
> If a server has multiple network interfaces we need to find a "GCD", a 
> network with all Ignite nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13344) [ML] DummyVectorizer fails to extract label for coordinate with value "0.0" when backed by sparse vector

2020-08-10 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174122#comment-17174122
 ] 

Alexey Zinoviev commented on IGNITE-13344:
--

[~thilo.ginkel] Great, thanks, I'll take into account and fix in the next 
release

> [ML] DummyVectorizer fails to extract label for coordinate with value "0.0" 
> when backed by sparse vector
> 
>
> Key: IGNITE-13344
> URL: https://issues.apache.org/jira/browse/IGNITE-13344
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8.1
>Reporter: Thilo-Alexander Ginkel
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> Given: A labeled DummyVectorizer:
>  
> {code:java}
> new DummyVectorizer()
>  .exclude(excludeCoordinates.stream().map(coord -> vectorLength + 
> coord).toArray(Integer[]::new))
>  .labeled(labelCoord);
> {code}
> {{When extracting the label, the call hierarchy eventually ends up at 
> org.apache.ignite.ml.dataset.feature.extractor.impl.DummyVectorizer#feature, 
> which returns null for val.getRaw when val is a sparse vector with the 
> element at the requested label coordinate being 0.0. This causes the training 
> job to fail (which expects a non-null label):}}
> {code:java}
> org.apache.ignite.IgniteException: Remote job threw user exception (override 
> or implement ComputeTask.result(..) method if you would like to have 
> automatic failover for this exception): 
> nullorg.apache.ignite.IgniteException: Remote job threw user exception 
> (override or implement ComputeTask.result(..) method if you would like to 
> have automatic failover for this exception): null at 
> org.apache.ignite.compute.ComputeTaskAdapter.result(ComputeTaskAdapter.java:102)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1062)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1055)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7037)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.result(GridTaskWorker.java:1055)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:862)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1146)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:961)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:809)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:659)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:519)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
> ~[ignite-core-2.8.1.jar:2.8.1] at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>  ~[na:na] at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>  ~[na:na] at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]Caused 
> by: org.apache.ignite.IgniteException: null at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1858)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:596)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7005)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:590)
>  ~[ignite-core-2.8.1.jar:2.8.1] ... 5 common frames omittedCaused by: 
> java.lang.NullPointerException: null at 
> org.apache.ignite.ml.dataset.impl.bootstrapping.BootstrappedDatasetBuilder.build(BootstrappedDatasetBuilder.java:91)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.bootstrapping.BootstrappedDatasetBuilder.build(BootstrappedDatasetBuilder.java:41)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$getData$4(ComputeUtils.java:239)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.cache.util.PartitionDataStorage.lambda$computeDataIfAbsent$1(PartitionDataStorage.java:56)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 

[jira] [Assigned] (IGNITE-13344) [ML] DummyVectorizer fails to extract label for coordinate with value "0.0" when backed by sparse vector

2020-08-10 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev reassigned IGNITE-13344:


Assignee: Alexey Zinoviev

> [ML] DummyVectorizer fails to extract label for coordinate with value "0.0" 
> when backed by sparse vector
> 
>
> Key: IGNITE-13344
> URL: https://issues.apache.org/jira/browse/IGNITE-13344
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8.1
>Reporter: Thilo-Alexander Ginkel
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> Given: A labeled DummyVectorizer:
>  
> {code:java}
> new DummyVectorizer()
>  .exclude(excludeCoordinates.stream().map(coord -> vectorLength + 
> coord).toArray(Integer[]::new))
>  .labeled(labelCoord);
> {code}
> {{When extracting the label, the call hierarchy eventually ends up at 
> org.apache.ignite.ml.dataset.feature.extractor.impl.DummyVectorizer#feature, 
> which returns null for val.getRaw when val is a sparse vector with the 
> element at the requested label coordinate being 0.0. This causes the training 
> job to fail (which expects a non-null label):}}
> {code:java}
> org.apache.ignite.IgniteException: Remote job threw user exception (override 
> or implement ComputeTask.result(..) method if you would like to have 
> automatic failover for this exception): 
> nullorg.apache.ignite.IgniteException: Remote job threw user exception 
> (override or implement ComputeTask.result(..) method if you would like to 
> have automatic failover for this exception): null at 
> org.apache.ignite.compute.ComputeTaskAdapter.result(ComputeTaskAdapter.java:102)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1062)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker$5.apply(GridTaskWorker.java:1055)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7037)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.result(GridTaskWorker.java:1055)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:862)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1146)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:961)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:809)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:659)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:519)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
> ~[ignite-core-2.8.1.jar:2.8.1] at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>  ~[na:na] at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
>  ~[na:na] at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]Caused 
> by: org.apache.ignite.IgniteException: null at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1858)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:596)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7005)
>  ~[ignite-core-2.8.1.jar:2.8.1] at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:590)
>  ~[ignite-core-2.8.1.jar:2.8.1] ... 5 common frames omittedCaused by: 
> java.lang.NullPointerException: null at 
> org.apache.ignite.ml.dataset.impl.bootstrapping.BootstrappedDatasetBuilder.build(BootstrappedDatasetBuilder.java:91)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.bootstrapping.BootstrappedDatasetBuilder.build(BootstrappedDatasetBuilder.java:41)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$getData$4(ComputeUtils.java:239)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> org.apache.ignite.ml.dataset.impl.cache.util.PartitionDataStorage.lambda$computeDataIfAbsent$1(PartitionDataStorage.java:56)
>  ~[ignite-ml-2.8.1.jar:2.8.1] at 
> java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
>  ~[na

[jira] [Commented] (IGNITE-11942) IGFS and Hadoop Accelerator Discontinuation

2020-07-06 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152093#comment-17152093
 ] 

Alexey Zinoviev commented on IGNITE-11942:
--

[~alex_pl] I'm not working on it, but i was a blocker for this ticket, as I 
mentioned above, currently it could be removed (if somebody do this)

> IGFS and Hadoop Accelerator Discontinuation
> ---
>
> Key: IGNITE-11942
> URL: https://issues.apache.org/jira/browse/IGNITE-11942
> Project: Ignite
>  Issue Type: Task
>Reporter: Denis A. Magda
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.9
>
>
> The community has voted for the following decision:
> * IGFS and In-Memory Hadoop Accelerator components are to be discontinued and 
> no longer supported by the community 
> * The existing source code of IGFS and In-Memory Hadoop Accelerator is to be 
> removed from Ignite master. Before that, a special branch like 
> "ignite-igfs-and-hadoop-accelerator" to be forked off the master in order to 
> preserve the sources in Git history for those who might need it. 
> The voting thread:
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42405.html
> Once the changes are made for Ignite 2.8, please contact Denis Magda to 
> update a public documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-10292) ML: Replace IGFS by model storage for TensorFlow

2020-07-06 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev resolved IGNITE-10292.
--
Resolution: Not A Problem

The TensorFlow component was removed from 2.8. No needs to fix ticket related 
to TensorFlow

> ML: Replace IGFS by model storage for TensorFlow
> 
>
> Key: IGNITE-10292
> URL: https://issues.apache.org/jira/browse/IGNITE-10292
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 3.0
>
>
> Currently we have a TensorFlow IGFS plugin that provides a file system 
> functionality (see 
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ignite).
>  At the same time IGFS is deprecated and would be great to replace it by a 
> simple model storage based on cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10292) ML: Replace IGFS by model storage for TensorFlow

2020-07-06 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152075#comment-17152075
 ] 

Alexey Zinoviev commented on IGNITE-10292:
--

[~alex_pl] not an issue, I'll close the ticket, thanks

 

The IGFS could be removed

> ML: Replace IGFS by model storage for TensorFlow
> 
>
> Key: IGNITE-10292
> URL: https://issues.apache.org/jira/browse/IGNITE-10292
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 3.0
>
>
> Currently we have a TensorFlow IGFS plugin that provides a file system 
> functionality (see 
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ignite).
>  At the same time IGFS is deprecated and would be great to replace it by a 
> simple model storage based on cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-10782) javadoc description for ml.math.exceptions.preprocessing and ml.selection.scoring.evaluator

2020-06-29 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev resolved IGNITE-10782.
--
Resolution: Fixed

> javadoc description for ml.math.exceptions.preprocessing and 
> ml.selection.scoring.evaluator
> ---
>
> Key: IGNITE-10782
> URL: https://issues.apache.org/jira/browse/IGNITE-10782
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation, ml
>Reporter: Stepan Pilschikov
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.9
>
>
> Need to add modules description for 
>  - org.apache.ignite.ml.math.exceptions.preprocessing 
>  - org.apache.ignite.ml.selection.scoring.evaluator
> Located in ignite/docs/overview-summary.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12274) [ML] DecisionTree works incorrectly if maxDeep > amount of features

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12274:
-
Affects Version/s: (was: 2.8)

> [ML] DecisionTree works incorrectly if maxDeep > amount of features
> ---
>
> Key: IGNITE-12274
> URL: https://issues.apache.org/jira/browse/IGNITE-12274
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Blocker
> Fix For: 2.10
>
>
> We have a problem in two places: 
> null nodes could be created here *MeanDecisionTreeLeafBuilder.createLeafNode* 
> method in the row *return aa != null ? new DecisionTreeLeafNode(aa[0]) : 
> null;*
> Probably, this situation is arising then the amount of features is smaller 
> than maxDeep



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12274) [ML] DecisionTree works incorrectly if maxDeep > amount of features

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12274:
-
Priority: Major  (was: Blocker)

> [ML] DecisionTree works incorrectly if maxDeep > amount of features
> ---
>
> Key: IGNITE-12274
> URL: https://issues.apache.org/jira/browse/IGNITE-12274
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> We have a problem in two places: 
> null nodes could be created here *MeanDecisionTreeLeafBuilder.createLeafNode* 
> method in the row *return aa != null ? new DecisionTreeLeafNode(aa[0]) : 
> null;*
> Probably, this situation is arising then the amount of features is smaller 
> than maxDeep



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12274) [ML] DecisionTree works incorrectly if maxDeep > amount of features

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12274:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] DecisionTree works incorrectly if maxDeep > amount of features
> ---
>
> Key: IGNITE-12274
> URL: https://issues.apache.org/jira/browse/IGNITE-12274
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Blocker
> Fix For: 2.10
>
>
> We have a problem in two places: 
> null nodes could be created here *MeanDecisionTreeLeafBuilder.createLeafNode* 
> method in the row *return aa != null ? new DecisionTreeLeafNode(aa[0]) : 
> null;*
> Probably, this situation is arising then the amount of features is smaller 
> than maxDeep



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-9740) [ML] Remove IgniteThread wrapper from ml unit test EvaluatorTest (follow up to IGNITE-9711)

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev resolved IGNITE-9740.
-
Resolution: Fixed

> [ML] Remove IgniteThread wrapper from ml unit test EvaluatorTest (follow up 
> to IGNITE-9711)
> ---
>
> Key: IGNITE-9740
> URL: https://issues.apache.org/jira/browse/IGNITE-9740
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Oleg Ignatenko
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.9
>
>
> [EvaluatorTest|https://github.com/apache/ignite/blob/master/modules/ml/src/test/java/org/apache/ignite/ml/selection/scoring/evaluator/EvaluatorTest.java]
>  involves {{IgniteThread}} which is in fact not needed there and should be 
> removed.
> {{IgniteThread}} usage is a remainder / copy-paste from older tests and 
> examples that were using API requiring it. This API has been removed and 
> there is no need for wrapping like that anymore. For the reference on how to 
> perform suggested cleanup check changes made to ml examples per IGNITE-9711.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9740) [ML] Remove IgniteThread wrapper from ml unit test EvaluatorTest (follow up to IGNITE-9711)

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146194#comment-17146194
 ] 

Alexey Zinoviev commented on IGNITE-9740:
-

This test was removed, no IgniteThread usage in ML tests anymore

> [ML] Remove IgniteThread wrapper from ml unit test EvaluatorTest (follow up 
> to IGNITE-9711)
> ---
>
> Key: IGNITE-9740
> URL: https://issues.apache.org/jira/browse/IGNITE-9740
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Reporter: Oleg Ignatenko
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.9
>
>
> [EvaluatorTest|https://github.com/apache/ignite/blob/master/modules/ml/src/test/java/org/apache/ignite/ml/selection/scoring/evaluator/EvaluatorTest.java]
>  involves {{IgniteThread}} which is in fact not needed there and should be 
> removed.
> {{IgniteThread}} usage is a remainder / copy-paste from older tests and 
> examples that were using API requiring it. This API has been removed and 
> there is no need for wrapping like that anymore. For the reference on how to 
> perform suggested cleanup check changes made to ml examples per IGNITE-9711.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10782) javadoc description for ml.math.exceptions.preprocessing and ml.selection.scoring.evaluator

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146193#comment-17146193
 ] 

Alexey Zinoviev commented on IGNITE-10782:
--

[~spilschikov] could you please have a look in 2.8 or 2.8.1 release. Seems like 
both package descritpion in place

> javadoc description for ml.math.exceptions.preprocessing and 
> ml.selection.scoring.evaluator
> ---
>
> Key: IGNITE-10782
> URL: https://issues.apache.org/jira/browse/IGNITE-10782
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation, ml
>Reporter: Stepan Pilschikov
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.9
>
>
> Need to add modules description for 
>  - org.apache.ignite.ml.math.exceptions.preprocessing 
>  - org.apache.ignite.ml.selection.scoring.evaluator
> Located in ignite/docs/overview-summary.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12587) ML examples failed on start

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146188#comment-17146188
 ] 

Alexey Zinoviev commented on IGNITE-12587:
--

[~agoncharuk] It was merged in common PR 
[https://github.com/apache/ignite/pull/7430]

> ML examples failed on start
> ---
>
> Key: IGNITE-12587
> URL: https://issues.apache.org/jira/browse/IGNITE-12587
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
> Environment: Java 8
> Linux/Win
>Reporter: Stepan Pilschikov
>Assignee: Alexey Zinoviev
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New release build comes with lost data sets for ML 2.8
> Steps:
> - Try to run any ML examples used MLSandboxDatasets 
> (org.apache.ignite.examples.ml.environment.TrainingWithCustomPreprocessorsExample
>  for examples)
> Actual:
> - FileNotFoundException
> {code}
> Exception in thread "main" java.io.FileNotFoundException: 
> modules/ml/src/main/resources/datasets/boston_housing_dataset.txt
>   at 
> org.apache.ignite.ml.util.SandboxMLCache.fillCacheWith(SandboxMLCache.java:119)
>   at 
> org.apache.ignite.examples.ml.environment.TrainingWithCustomPreprocessorsExample.main(TrainingWithCustomPreprocessorsExample.java:62)
> {code}
> Release build - 
> https://ci.ignite.apache.org/viewLog.html?buildId=4957767&buildTypeId=Releases_ApacheIgniteMain_ReleaseBuild&tab=artifacts&branch_Releases_ApacheIgniteMain=ignite-2.8



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12673) [ML] Fix ML examples logging

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146186#comment-17146186
 ] 

Alexey Zinoviev commented on IGNITE-12673:
--

It was merged to master in [https://github.com/apache/ignite/pull/7430]

> [ML] Fix ML examples logging
> 
>
> Key: IGNITE-12673
> URL: https://issues.apache.org/jira/browse/IGNITE-12673
> Project: Ignite
>  Issue Type: Bug
>  Components: examples, ml
>Affects Versions: 2.8
>Reporter: Stepan Pilschikov
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compile of several minor fixes for ML examples:
> 1. In TutorialStepByStepExample we running 17 examples
> First 12 logging is pretty good and looks like "Tutorial step N: name" -> 
> model -> accuracy -> "Tutorial step N: completed"
> But then starting with 13 this pattern is kind of broke, step start and step 
> completion is missing
> 2. Step_8_CV_with_Param_Grid_and_metrics_and_pipeline is haven't step 
> completion log 
> 3. Complete log for Step_9_Scaling_With_Stacking looks like 'Tutorial step 5 
> (scaling) example completed'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12657) ML examples EvaluatorExample and MultipleMetricsExample looks the same

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146185#comment-17146185
 ] 

Alexey Zinoviev commented on IGNITE-12657:
--

It was merged to master branch here [https://github.com/apache/ignite/pull/7430]

> ML examples EvaluatorExample and MultipleMetricsExample looks the same
> --
>
> Key: IGNITE-12657
> URL: https://issues.apache.org/jira/browse/IGNITE-12657
> Project: Ignite
>  Issue Type: Bug
>  Components: examples, ml
>Affects Versions: 2.8
>Reporter: Stepan Pilschikov
>Assignee: Alexey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>
> Examples
> org.apache.ignite.examples.ml.selection.scoring.EvaluatorExample
> and 
> org.apache.ignite.examples.ml.selection.scoring.MultipleMetricsExample
> Looks exactly the same
> I think MultipleMetricsExample is wrong because description told about using 
> KNNClassificationTrainer but actually used SVMLinearClassificationTrainer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12658) [ML][Examples] TutorialStepByStepExample failed on cluster with more then 1 node

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146182#comment-17146182
 ] 

Alexey Zinoviev commented on IGNITE-12658:
--

It was merged to master branch in this PR 
[https://github.com/apache/ignite/pull/7430]

> [ML][Examples] TutorialStepByStepExample failed on cluster with more then 1 
> node
> 
>
> Key: IGNITE-12658
> URL: https://issues.apache.org/jira/browse/IGNITE-12658
> Project: Ignite
>  Issue Type: Bug
>  Components: examples, ml
>Affects Versions: 2.8
> Environment: Ubuntu/Win
> Java 8
>Reporter: Stepan Pilschikov
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>
> Steps to reproduce:
>  1. Run Ignite node with org.apache.ignite.examples.ExampleNodeStartup (1 
> node will be enough)
>  2. Run org.apache.ignite.examples.ml.tutorial.TutorialStepByStepExample
> Actual:
>  On Step_8_CV_with_Param_Grid_and_metrics starting to throw a lot of 
> exceptions
> {code:java}
> Train with p: 2 and maxDeep: 1
> >>> Trained model: if (x1 > 0.4368) then return 1. else return 0.
> >>> Accuracy 0.7679083094555874
> >>> Test Error 0.2320916905444126
> >>> Tutorial step 8 (cross-validation) example completed.
> [13:25:40] Ignite node stopped OK [uptime=00:00:17.453]
> >>> Tutorial step 8 (cross-validation with param grid) example started.
> [13:25:40]__   
> [13:25:40]   /  _/ ___/ |/ /  _/_  __/ __/ 
> [13:25:40]  _/ // (7 7// /  / / / _/   
> [13:25:40] /___/\___/_/|_/___/ /_/ /___/  
> [13:25:40] 
> [13:25:40] ver. 2.8.0#20200130-sha1:f478aa56
> [13:25:40] 2020 Copyright(C) Apache Software Foundation
> [13:25:40] 
> [13:25:40] Ignite documentation: http://ignite.apache.org
> [13:25:40] 
> [13:25:40] Quiet mode.
> [13:25:40]   ^-- Logging to file 
> '/opt/buildagent/work/d501ae8146bd8253/i2test/var/suite-examples/app-ignite/work/log/ignite-e156b2f2.log'
> [13:25:40]   ^-- Logging by 'Log4JLogger [quiet=true, config=null]'
> [13:25:40]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or 
> "-v" to ignite.{sh|bat}
> [13:25:40] 
> [13:25:40] OS: Linux 4.15.0-65-generic amd64
> [13:25:40] VM information: Java(TM) SE Runtime Environment 1.8.0_221-b11 
> Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.221-b11
> [13:25:40] Please set system property '-Djava.net.preferIPv4Stack=true' to 
> avoid possible problems in mixed environments.
> [13:25:40] Configured plugins:
> [13:25:40]   ^-- ml-inference-plugin 1.0.0
> [13:25:40]   ^-- null
> [13:25:40] 
> [13:25:40] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT
> [13:25:40] Message queue limit is set to 0 which may lead to potential OOMEs 
> when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to 
> message queues growth on sender and receiver sides.
> [13:25:40] Security status [authentication=off, tls/ssl=off]
> [13:25:41] Performance suggestions for grid  (fix if possible)
> [13:25:41] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
> [13:25:41]   ^-- Disable grid events (remove 'includeEventTypes' from 
> configuration)
> [13:25:41]   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM 
> options)
> [13:25:41]   ^-- Set max direct memory size if getting 'OOME: Direct buffer 
> memory' (add '-XX:MaxDirectMemorySize=[g|G|m|M|k|K]' to JVM options)
> [13:25:41]   ^-- Disable processing of calls to System.gc() (add 
> '-XX:+DisableExplicitGC' to JVM options)
> [13:25:41] Refer to this page for more performance suggestions: 
> https://apacheignite.readme.io/docs/jvm-and-system-tuning
> [13:25:41] 
> [13:25:41] To start Console Management & Monitoring run 
> ignitevisorcmd.{sh|bat}
> [13:25:41] Data Regions Configured:
> [13:25:41]   ^-- Default_Region [initSize=500.0 MiB, maxSize=18.9 GiB, 
> persistence=false, lazyMemoryAllocation=true]
> [13:25:41] 
> [13:25:41] Ignite node started OK (id=e156b2f2)
> [13:25:41] Topology snapshot [ver=20, locNode=e156b2f2, servers=2, clients=0, 
> state=ACTIVE, CPUs=5, offheap=38.0GB, heap=3.0GB]
> [13:25:41]   ^-- Baseline [id=0, size=2, online=2, offline=0]
> [2020-02-11 13:25:42,428][ERROR][sys-#593][GridTaskWorker] Failed to obtain 
> remote job result policy for result from ComputeTask.result(..) method (will 
> fail the whole task): GridJobResultImpl [job=C2 
> [c=o.a.i.ml.dataset.impl.cache.util.ComputeUtils$DeployableCallable@30e27659],
>  sib=GridJobSiblingImpl 
> [sesId=f9aced33071-e156b2f2-d116-4389-bd43-8536dc59, 
> jobId=1aaced33071-e156b2f2-d116-4389-bd43-8536dc59, 
> nodeId=f1135598-73c8-43

[jira] [Commented] (IGNITE-12660) [ML] The ParamGrid uses unserialized lambdas in interface to get an access to the trainer fields

2020-06-26 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146181#comment-17146181
 ] 

Alexey Zinoviev commented on IGNITE-12660:
--

[~agoncharuk] It was resolved in this PR 
[https://github.com/apache/ignite/pull/7430]

and merged to master

 

> [ML] The ParamGrid uses unserialized lambdas in interface to get an access to 
> the trainer fields
> 
>
> Key: IGNITE-12660
> URL: https://issues.apache.org/jira/browse/IGNITE-12660
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Blocker
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10539) [ML] Make 'with' methods consistent

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10539:
-
Priority: Major  (was: Critical)

> [ML] Make 'with' methods consistent
> ---
>
> Key: IGNITE-10539
> URL: https://issues.apache.org/jira/browse/IGNITE-10539
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Artem Malykh
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> In some places we have 'with*' methods making inplace changes and returning 
> object itself (for example MLPTrainer::withLoss) while in other places we 
> have them creating new instances with corresponding parameter changed (for 
> example DatasetBuilder::withFilter, 
> DatasetBuilder::withUpstreamTrainsformer). This inconsistency makes user look 
> into javadoc each time and worsens overall API consistensy level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12288) [ML] Replace assert logic with exceptions

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12288:
-
Priority: Minor  (was: Critical)

> [ML] Replace assert logic with exceptions
> -
>
> Key: IGNITE-12288
> URL: https://issues.apache.org/jira/browse/IGNITE-12288
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.10
>
>
> 1) Add exceptions instead of assert logic
> 2) Add tests for the proposed exceptions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12054) [Umbrella][Spark] Upgrade Spark module to 2.4

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12054:
-
Priority: Major  (was: Blocker)

> [Umbrella][Spark] Upgrade Spark module to 2.4
> -
>
> Key: IGNITE-12054
> URL: https://issues.apache.org/jira/browse/IGNITE-12054
> Project: Ignite
>  Issue Type: New Feature
>  Components: spark
>Reporter: Denis A. Magda
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: important
> Fix For: 3.0
>
> Attachments: ignite-spark-patch-new.diff
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Users can't use APIs that are already available in Spark 2.4:
> https://stackoverflow.com/questions/57392143/persisting-spark-dataframe-to-ignite
> Let's upgrade Spark from 2.3 to 2.4 until we extract the Spark Integration as 
> a separate module that can support multiple Spark versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Fix Version/s: (was: 2.9)
   2.10

> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
>  
> We need to be able to export/import Ignite model versions across clusters 
> with different versions and have exchangable & human-readable format for 
> inference with different systems like scikit-learn, Spark ML and etc
> The PMML format is a good choice here: 
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
>  (i) [https://github.com/jpmml/jpmml-model]
>  
> But PMML has limitation support for Ensembles like Random Forest, Gradient 
> Boosted Trees, Stacking, Bagging and so on.
> These cases could be covered with our own JSON format which could be easily 
> parsed in another system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Description: 
 

We need to be able to export/import Ignite model versions across clusters with 
different versions and have exchangable & human-readable format for inference 
with different systems like scikit-learn, Spark ML and etc

The PMML format is a good choice here: 

PMML - Predictive Model Markup Language is XML based language which used in 
SPARK MLlib and others platforms.

Here some additional info about PMML:

(i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
 (i) [https://github.com/jpmml/jpmml-model]

 

But PMML has limitation support for Ensembles like Random Forest, Gradient 
Boosted Trees, Stacking, Bagging and so on.

These cases could be covered with our own JSON format which could be easily 
parsed in another system.

  was:
 

 

 

PMML - Predictive Model Markup Language is XML based language which used in 
SPARK MLlib and others platforms.

Here some additional info about PMML:

(i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
 (i) [https://github.com/jpmml/jpmml-model]


> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.9
>
>
>  
> We need to be able to export/import Ignite model versions across clusters 
> with different versions and have exchangable & human-readable format for 
> inference with different systems like scikit-learn, Spark ML and etc
> The PMML format is a good choice here: 
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
>  (i) [https://github.com/jpmml/jpmml-model]
>  
> But PMML has limitation support for Ensembles like Random Forest, Gradient 
> Boosted Trees, Stacking, Bagging and so on.
> These cases could be covered with our own JSON format which could be easily 
> parsed in another system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Description: 
 

 

 

PMML - Predictive Model Markup Language is XML based language which used in 
SPARK MLlib and others platforms.

Here some additional info about PMML:

(i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
 (i) [https://github.com/jpmml/jpmml-model]

  was:
PMML - Predictive Model Markup Language is XML based language which used in 
SPARK MLlib and others platforms.

Here some additional info about PMML:

(i) http://dmg.org/pmml/v4-3/GeneralStructure.html
(i) https://github.com/jpmml/jpmml-model


> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.9
>
>
>  
>  
>  
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) [http://dmg.org/pmml/v4-3/GeneralStructure.html]
>  (i) [https://github.com/jpmml/jpmml-model]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12337) [ML] Redesign the package structure

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12337:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] Redesign the package structure
> ---
>
> Key: IGNITE-12337
> URL: https://issues.apache.org/jira/browse/IGNITE-12337
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.10
>
>
> The problem is the next: a lot of classes and algorithms are located in not 
> the appropriate places and are not grouped in the high-level packages 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-7593) Improve data used in DecisionTreesExample

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-7593:

Fix Version/s: (was: 2.9)

> Improve data used in DecisionTreesExample
> -
>
> Key: IGNITE-7593
> URL: https://issues.apache.org/jira/browse/IGNITE-7593
> Project: Ignite
>  Issue Type: Task
>  Components: ml
>Reporter: Oleg Ignatenko
>Assignee: Alexey Zinoviev
>Priority: Minor
>
> Data currently used in {{DecisionTreesExample}} looks not quite optimal:
> # It is large, as evidenced in the warning in javadocs: "It is recommended to 
> start at least one node prior to launching this example if you intend to run 
> it with default memory settings."
> # It makes example run for quite a long time.
> # It doesn't have license (likely meaning "all rights reserved" by default) 
> which makes it troublesome to include in project sources so that current 
> approach is to prompt user to download it, additionally complicated by making 
> example skip when run unattended from {{IgniteExamplesMLTestSuite}}.
> Suggest to find or construct a smaller data for this example which would 
> still make sense to demonstrate how algorithm works and in the same time 
> would be 1) easier on memory usage, 2) quicker to run and 3) would allow 
> carrying it within project instead of prompting user to download it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12331) [ML] ML Preprocessing doesn't work on SQL Tables

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12331:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] ML Preprocessing doesn't work on SQL Tables
> 
>
> Key: IGNITE-12331
> URL: https://issues.apache.org/jira/browse/IGNITE-12331
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> {code:java}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.examples.ml.tutorial.sql;
> import java.util.List;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteCache;
> import org.apache.ignite.Ignition;
> import org.apache.ignite.cache.query.QueryCursor;
> import org.apache.ignite.cache.query.SqlFieldsQuery;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.internal.util.IgniteUtils;
> import org.apache.ignite.ml.dataset.feature.extractor.Vectorizer;
> import 
> org.apache.ignite.ml.dataset.feature.extractor.impl.BinaryObjectVectorizer;
> import org.apache.ignite.ml.math.primitives.vector.Vector;
> import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
> import org.apache.ignite.ml.preprocessing.Preprocessor;
> import org.apache.ignite.ml.preprocessing.minmaxscaling.MinMaxScalerTrainer;
> import org.apache.ignite.ml.preprocessing.normalization.NormalizationTrainer;
> import org.apache.ignite.ml.sql.SqlDatasetBuilder;
> import org.apache.ignite.ml.tree.DecisionTreeClassificationTrainer;
> import org.apache.ignite.ml.tree.DecisionTreeNode;
> /**
>  * Example of using distributed {@link DecisionTreeClassificationTrainer} on 
> a data stored in SQL table.
>  */
> public class PreprocessingAndTrainingSQLTableExample {
> /**
>  * Dummy cache name.
>  */
> private static final String DUMMY_CACHE_NAME = "dummy_cache";
> /**
>  * Training data.
>  */
> private static final String TRAIN_DATA_RES = 
> "examples/src/main/resources/datasets/titanic_train.csv";
> /**
>  * Test data.
>  */
> private static final String TEST_DATA_RES = 
> "examples/src/main/resources/datasets/titanic_test.csv";
> /**
>  * Run example.
>  */
> public static void main(String[] args) {
> System.out.println(">>> Decision tree classification trainer example 
> started.");
> // Start ignite grid.
> try (Ignite ignite = 
> Ignition.start("examples/config/example-ignite.xml")) {
> System.out.println(">>> Ignite grid started.");
> // Dummy cache is required to perform SQL queries.
> CacheConfiguration cacheCfg = new 
> CacheConfiguration<>(DUMMY_CACHE_NAME)
> .setSqlSchema("PUBLIC");
> IgniteCache cache = null;
> try {
> cache = ignite.getOrCreateCache(cacheCfg);
> System.out.println(">>> Creating table with training 
> data...");
> cache.query(new SqlFieldsQuery("create table titanic_train 
> (\n" +
> "passengerid int primary key,\n" +
> "survived int,\n" +
> "pclass int,\n" +
> "name varchar(255),\n" +
> "sex varchar(255),\n" +
> "age float,\n" +
> "sibsp int,\n" +
> "parch int,\n" +
> "ticket varchar(255),\n" +
> "fare float,\n" +
> "cabin varchar(255),\n" +
> "embarked varchar(255)\n" +
> ") with \"template=partitioned\";")).getAll();
> System.out.println(">>> Filling training data...");
> cache.query(new SqlFieldsQuery("insert into titanic_train 
> select * from csvread('" +
> 
> IgniteUtils.resolveI

[jira] [Updated] (IGNITE-12685) [ML] [Umbrella] Unify Preprocessors and Pipeline approaches to collect common statistics

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12685:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] [Umbrella] Unify Preprocessors  and Pipeline approaches to collect 
> common statistics 
> --
>
> Key: IGNITE-12685
> URL: https://issues.apache.org/jira/browse/IGNITE-12685
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> In the current implementation we have different behavior in Cross-Validation 
> during running on the experimental Pipeline and chain of Preprocessors.
>  
> Look at the tutorial step 8 CV_Param_Grid and 8_CV_Param_Grid_and_pipeline
> In the first example all preprocessors fits on the whole dataset and don't 
> use train/test filter (due to limited API in preprocessors), and collects the 
> stat on the whole initial dataset.
>  
> In the second example, we have honest re-fitting on each cross-validation 
> fold three times with three different stats. As a result we could get a 
> different encoding values or Max/Min values for each column and so on.
>  
> Should learn this question and be in consistency with the most popular 
> approaches.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10426) [ML] Spread parameter isKeepRawLabels across all models

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10426:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] Spread parameter isKeepRawLabels across all models
> ---
>
> Key: IGNITE-10426
> URL: https://issues.apache.org/jira/browse/IGNITE-10426
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> Currently, a few models has the parameter isKeepRawLabels and threshold to 
> change predicted value to one of class labels 1 or 0.
> Discuss this in dev-list and think how to solve this task to optimize 
> MultiClassModel
> Possible solution:
>  * add these methods to common model
>  * add this method to MultiClassModel and use reflection to check this 
> parameter in apply method for example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12079:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML][Umbrella] Add advanced preprocessing techniques
> 
>
> Key: IGNITE-12079
> URL: https://issues.apache.org/jira/browse/IGNITE-12079
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> *Main goal:*
> To reduce the gap between Apache Spark and Apache Ignite in preprocessing 
> operations. The reducing of the gap could help with loading Spark ML 
> Pipelines to Ignite ML.
>  
> Next steps:
>  # Add Frequency Encoder
>  # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, 
> LEAST_FREQUENT)
>  # Add RobustScaler (will be added in Spark 3.0)
>  # Add CountVectorizer
>  # Add FeatureHasher
>  # Add QuantileDiscretizer
>  # Add Locality Sensitive Hashing (LSH)
>  # Add LabelEncoder
>  # Add RevertStringIndexing
>  # Add multi-column preprocessor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10539) [ML] Make 'with' methods consistent

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10539:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] Make 'with' methods consistent
> ---
>
> Key: IGNITE-10539
> URL: https://issues.apache.org/jira/browse/IGNITE-10539
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Artem Malykh
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.10
>
>
> In some places we have 'with*' methods making inplace changes and returning 
> object itself (for example MLPTrainer::withLoss) while in other places we 
> have them creating new instances with corresponding parameter changed (for 
> example DatasetBuilder::withFilter, 
> DatasetBuilder::withUpstreamTrainsformer). This inconsistency makes user look 
> into javadoc each time and worsens overall API consistensy level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9414) [ML] Using sparce vectors in Tree-based algorithms.

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-9414:

Fix Version/s: (was: 2.9)
   2.10

> [ML] Using sparce vectors in Tree-based algorithms.
> ---
>
> Key: IGNITE-9414
> URL: https://issues.apache.org/jira/browse/IGNITE-9414
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> We need to support sparce vectors in DecisionTrees, RF, GDB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9415) [ML] Using sparce vectors in LSQR and MLP

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-9415:

Fix Version/s: (was: 2.9)
   2.10

> [ML] Using sparce vectors in LSQR and MLP
> -
>
> Key: IGNITE-9415
> URL: https://issues.apache.org/jira/browse/IGNITE-9415
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> We need to investigate and apply sparce vectors support in BLAS for LSQR and 
> MLP (or implement own version)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12288) [ML] Replace assert logic with exceptions

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12288:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] Replace assert logic with exceptions
> -
>
> Key: IGNITE-12288
> URL: https://issues.apache.org/jira/browse/IGNITE-12288
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.10
>
>
> 1) Add exceptions instead of assert logic
> 2) Add tests for the proposed exceptions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11664) [ML] Use Double.NaN as default values for missing values in Vector

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-11664:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] Use Double.NaN as default values for missing values in Vector
> --
>
> Key: IGNITE-11664
> URL: https://issues.apache.org/jira/browse/IGNITE-11664
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: stability
> Fix For: 2.10
>
>
> Currently, we use 0.0 value for default values in vectors if a value is 
> missing. But this way contradicts to preprocessors politics where for missing 
> values Double.NaN is using. Moreover, Double.NaN is a more convenient value 
> for missing feature values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10503) [ML] Meta information for vectors

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10503:
-
Fix Version/s: (was: 2.9)
   2.10

> [ML] Meta information for vectors
> -
>
> Key: IGNITE-10503
> URL: https://issues.apache.org/jira/browse/IGNITE-10503
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Platonov
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.10
>
>
> We need to design and implement vector meta-information like feature names, 
> bagging information, etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Priority: Major  (was: Minor)

> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.9
>
>
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) http://dmg.org/pmml/v4-3/GeneralStructure.html
> (i) https://github.com/jpmml/jpmml-model



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Model export/import to PMML and custom JSON format

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Summary: [Umbrella] Model export/import to PMML and custom JSON format  
(was: [Umbrella] Integration with PMML)

> [Umbrella] Model export/import to PMML and custom JSON format
> -
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.9
>
>
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) http://dmg.org/pmml/v4-3/GeneralStructure.html
> (i) https://github.com/jpmml/jpmml-model



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6642) [Umbrella] Integration with PMML

2020-06-26 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-6642:

Reporter: Alexey Zinoviev  (was: Yury Babak)

> [Umbrella] Integration with PMML
> 
>
> Key: IGNITE-6642
> URL: https://issues.apache.org/jira/browse/IGNITE-6642
> Project: Ignite
>  Issue Type: New Feature
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Minor
> Fix For: 2.9
>
>
> PMML - Predictive Model Markup Language is XML based language which used in 
> SPARK MLlib and others platforms.
> Here some additional info about PMML:
> (i) http://dmg.org/pmml/v4-3/GeneralStructure.html
> (i) https://github.com/jpmml/jpmml-model



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12716) Ignite support for spark-3.0.0

2020-06-24 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143850#comment-17143850
 ] 

Alexey Zinoviev commented on IGNITE-12716:
--

Dear [~shensonj] I'm not going to release something for spark support, hope 
another contributor could help

> Ignite support for spark-3.0.0
> --
>
> Key: IGNITE-12716
> URL: https://issues.apache.org/jira/browse/IGNITE-12716
> Project: Ignite
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.7, 2.8, 2.7.5, 2.7.6
>Reporter: Shenson Joseph
>Priority: Blocker
>
> Ignite support for spark-3.0.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12903) Fix ML + SQL examples

2020-06-16 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12903:
-
Fix Version/s: 2.9

> Fix ML + SQL examples
> -
>
> Key: IGNITE-12903
> URL: https://issues.apache.org/jira/browse/IGNITE-12903
> Project: Ignite
>  Issue Type: Task
>  Components: examples, ml
>Reporter: Taras Ledkov
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 2.9
>
>
> The examples
> {{DecisionTreeClassificationTrainerSQLInferenceExample}}
> {{DecisionTreeClassificationTrainerSQLTableExample}}
> are used CSVREAD function to initial load data into cluster.
> Must be changed because this function is disabled by default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12903) Fix ML + SQL examples

2020-06-16 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136757#comment-17136757
 ] 

Alexey Zinoviev commented on IGNITE-12903:
--

[~tledkov-gridgain] great, I'd prefer the third example, I suppose like in 
another examples.

Will wait cool implementations,

I reassigned ticket on myself, fix it for 2.9.

> Fix ML + SQL examples
> -
>
> Key: IGNITE-12903
> URL: https://issues.apache.org/jira/browse/IGNITE-12903
> Project: Ignite
>  Issue Type: Task
>  Components: examples, ml
>Reporter: Taras Ledkov
>Assignee: Alexey Zinoviev
>Priority: Major
>
> The examples
> {{DecisionTreeClassificationTrainerSQLInferenceExample}}
> {{DecisionTreeClassificationTrainerSQLTableExample}}
> are used CSVREAD function to initial load data into cluster.
> Must be changed because this function is disabled by default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12903) Fix ML + SQL examples

2020-06-16 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev reassigned IGNITE-12903:


Assignee: Alexey Zinoviev  (was: Taras Ledkov)

> Fix ML + SQL examples
> -
>
> Key: IGNITE-12903
> URL: https://issues.apache.org/jira/browse/IGNITE-12903
> Project: Ignite
>  Issue Type: Task
>  Components: examples
>Reporter: Taras Ledkov
>Assignee: Alexey Zinoviev
>Priority: Major
>
> The examples
> {{DecisionTreeClassificationTrainerSQLInferenceExample}}
> {{DecisionTreeClassificationTrainerSQLTableExample}}
> are used CSVREAD function to initial load data into cluster.
> Must be changed because this function is disabled by default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12903) Fix ML + SQL examples

2020-06-16 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12903:
-
Component/s: ml

> Fix ML + SQL examples
> -
>
> Key: IGNITE-12903
> URL: https://issues.apache.org/jira/browse/IGNITE-12903
> Project: Ignite
>  Issue Type: Task
>  Components: examples, ml
>Reporter: Taras Ledkov
>Assignee: Alexey Zinoviev
>Priority: Major
>
> The examples
> {{DecisionTreeClassificationTrainerSQLInferenceExample}}
> {{DecisionTreeClassificationTrainerSQLTableExample}}
> are used CSVREAD function to initial load data into cluster.
> Must be changed because this function is disabled by default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12903) Fix ML + SQL examples

2020-06-16 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136508#comment-17136508
 ] 

Alexey Zinoviev commented on IGNITE-12903:
--

[~tledkov-gridgain] What is the best way to fix? Enable this function manually 
(could you suggest the way, here, in comments) or the best way here to populate 
cache manually not from CSV. What do you think?

> Fix ML + SQL examples
> -
>
> Key: IGNITE-12903
> URL: https://issues.apache.org/jira/browse/IGNITE-12903
> Project: Ignite
>  Issue Type: Task
>  Components: examples
>Reporter: Taras Ledkov
>Assignee: Taras Ledkov
>Priority: Major
>
> The examples
> {{DecisionTreeClassificationTrainerSQLInferenceExample}}
> {{DecisionTreeClassificationTrainerSQLTableExample}}
> are used CSVREAD function to initial load data into cluster.
> Must be changed because this function is disabled by default



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-8451) [ML] Refactor Labeled Dataset: remove unused methods and fields

2020-06-10 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-8451:

Fix Version/s: (was: 2.9)
   3.0

> [ML] Refactor Labeled Dataset: remove unused methods and fields
> ---
>
> Key: IGNITE-8451
> URL: https://issues.apache.org/jira/browse/IGNITE-8451
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 3.0
>
>
> Remove
>  * loading from file
>  * distributed version (we need local version only)
>  * parent class Dataset and meta-information



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12432) [Spark] Need to add test for AVG function in IgniteOptimizationAggregationFuncSpec

2020-06-10 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-12432:
-
Fix Version/s: (was: 2.9)
   3.0

> [Spark] Need to add test for AVG function in 
> IgniteOptimizationAggregationFuncSpec
> --
>
> Key: IGNITE-12432
> URL: https://issues.apache.org/jira/browse/IGNITE-12432
> Project: Ignite
>  Issue Type: Test
>  Components: spark
>Reporter: Alexey Zinoviev
>Assignee: Alexey Zinoviev
>Priority: Major
> Fix For: 3.0
>
>
> The test is skipped with TODO: write me
> it("AVG - DECIMAL") {
>  //TODO: write me
> }
> It should be merged to 2.3 and 2.4 Spark together



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   >