[ https://issues.apache.org/jira/browse/IGNITE-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Glenn Wiebe updated IGNITE-12849: --------------------------------- Attachment: DenseStringBinaryObjectVectorizer.java DenseIntBinaryObjectVectorizer.java > Add New BinaryObject Vectorizer for SparseVectors and Integer Coordinates > ------------------------------------------------------------------------- > > Key: IGNITE-12849 > URL: https://issues.apache.org/jira/browse/IGNITE-12849 > Project: Ignite > Issue Type: New Feature > Components: ml > Affects Versions: 2.8 > Reporter: Glenn Wiebe > Assignee: Alexey Zinoviev > Priority: Minor > Fix For: 2.9 > > Attachments: DenseIntBinaryObjectVectorizer.java, > DenseStringBinaryObjectVectorizer.java > > > A. DenseVector-based BinaryObjectVectorizer > When using existing caches as a source of Datasets, the > BinaryObjectVectorizer is used. > The existing BinaryObjectVectorizer only supports the creation of a > SparseVector. > The LUDecomposition utility that supports gaussian factorization for models > like GMM have a "Singularity indicator" for which a SparseVector and its null > handling will set a matrix column calculation to be zero/0.0 which is below > the minimum check value (1e-11) and thus indicate a matrix is not square. > This null handling of the SparseMatrix will restrict the use of some > algorithms like Gaussian Mixture Models where any Vector dimension that is > null will incorrectly signal that a matrix is not square. > It would be great if we could: > - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this > singularity trigger and enable use of GMM Trainer. > B. CacheBasedDatasets not treated as Temporary Cache > When using a cache-based dataset, the close() method destroys the Ignite > cache. This means that there is no ability to re-use the data loaded into > this dataset. > It would be great if we could: > - Not destroy the Ignite Cache holding the dataset on close (of one step in > an ML processing flow) > - Allow for "attaching" to this prior, pre-calculated dataset in subsequent > use. > C. Vector Visibility > Vectors (unlike other value types, e.g. BinaryObjects) are not visible in > standard mechanisms, like the Ignite Web Console, where the toString() method > does not present any information about the embedded vector values. > It would be great if we could: > - have a Vector.toString() method implementation that presented some > information about what is actually in the Vector. > I have implemented the above items and have used them at a customer where I > needed these capabilities (or at least it dramatically reduced the cost and > increased the value of the solution). > It would be great if the community was supportive of this > expansion/improvement of the Ignite ML library. > Thanks, > Glenn -- This message was sent by Atlassian Jira (v8.3.4#803005)