[jira] [Created] (IGNITE-7827) Adopt kNN regression to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7827: Summary: Adopt kNN regression to the new Partitioned Dataset Key: IGNITE-7827 URL: https://issues.apache.org/jira/browse/IGNITE-7827 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7828) Adopt yardstick tests for the new version of kNN regression algorithm
Aleksey Zinoviev created IGNITE-7828: Summary: Adopt yardstick tests for the new version of kNN regression algorithm Key: IGNITE-7828 URL: https://issues.apache.org/jira/browse/IGNITE-7828 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7829) Adopt kNN regression example to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7829: Summary: Adopt kNN regression example to the new Partitioned Dataset Key: IGNITE-7829 URL: https://issues.apache.org/jira/browse/IGNITE-7829 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7830) Adopt kNN model to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7830: Summary: Adopt kNN model to the new Partitioned Dataset Key: IGNITE-7830 URL: https://issues.apache.org/jira/browse/IGNITE-7830 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7875) Adopt SVM to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7875: Summary: Adopt SVM to the new Partitioned Dataset Key: IGNITE-7875 URL: https://issues.apache.org/jira/browse/IGNITE-7875 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7876) [ML] Adopt SVM Linear Binary Classification Model and Trainer to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7876: Summary: [ML] Adopt SVM Linear Binary Classification Model and Trainer to the new Partitioned Dataset Key: IGNITE-7876 URL: https://issues.apache.org/jira/browse/IGNITE-7876 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7887) [ML] Adopt SVM Linear Multi-Class Classification Model and Trainer to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7887: Summary: [ML] Adopt SVM Linear Multi-Class Classification Model and Trainer to the new Partitioned Dataset Key: IGNITE-7887 URL: https://issues.apache.org/jira/browse/IGNITE-7887 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7932) [ML] Adopt SVM Linear Binary Classification Example to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7932: Summary: [ML] Adopt SVM Linear Binary Classification Example to the new Partitioned Dataset Key: IGNITE-7932 URL: https://issues.apache.org/jira/browse/IGNITE-7932 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7938) [ML] Adopt KMeans to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-7938: Summary: [ML] Adopt KMeans to the new Partitioned Dataset Key: IGNITE-7938 URL: https://issues.apache.org/jira/browse/IGNITE-7938 Project: Ignite Issue Type: Improvement Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8005) [ML] Adopt SVM Linear MultiClass Classification Example to the new Partitioned Dataset
Aleksey Zinoviev created IGNITE-8005: Summary: [ML] Adopt SVM Linear MultiClass Classification Example to the new Partitioned Dataset Key: IGNITE-8005 URL: https://issues.apache.org/jira/browse/IGNITE-8005 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8168) [ML] Add KMeans version for Partitioned Datasets
Aleksey Zinoviev created IGNITE-8168: Summary: [ML] Add KMeans version for Partitioned Datasets Key: IGNITE-8168 URL: https://issues.apache.org/jira/browse/IGNITE-8168 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8169) [ML] Implement Model-Trainer pair for KMeans based on Partitioned Dataset
Aleksey Zinoviev created IGNITE-8169: Summary: [ML] Implement Model-Trainer pair for KMeans based on Partitioned Dataset Key: IGNITE-8169 URL: https://issues.apache.org/jira/browse/IGNITE-8169 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8170) [ML] Adopt KMeans example to the Partitioned Dataset
Aleksey Zinoviev created IGNITE-8170: Summary: [ML] Adopt KMeans example to the Partitioned Dataset Key: IGNITE-8170 URL: https://issues.apache.org/jira/browse/IGNITE-8170 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8250) Adopt Fuzzy CMeans to PartitionedDatasets
Aleksey Zinoviev created IGNITE-8250: Summary: Adopt Fuzzy CMeans to PartitionedDatasets Key: IGNITE-8250 URL: https://issues.apache.org/jira/browse/IGNITE-8250 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8396) Add documentation for kNN classification (release 2.5)
Aleksey Zinoviev created IGNITE-8396: Summary: Add documentation for kNN classification (release 2.5) Key: IGNITE-8396 URL: https://issues.apache.org/jira/browse/IGNITE-8396 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have added a normalization preprocessor working on top of partition based dataset and now we need to add documentation for this feature. Previous version: https://dash.readme.io/project/apacheignite/v2.4/docs/knn-classification update with New version: https://dash.readme.io/project/apacheignite/v2.4/docs/k-nn-classification-25 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8397) Update documentation for kNN regression (release 2.5)
Aleksey Zinoviev created IGNITE-8397: Summary: Update documentation for kNN regression (release 2.5) Key: IGNITE-8397 URL: https://issues.apache.org/jira/browse/IGNITE-8397 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have changed a kNN regression working on top of partition based dataset and now we need to update documentation for this feature. Previous version: [https://dash.readme.io/project/apacheignite/v2.4/docs/knn-regression] update with New version: [https://dash.readme.io/project/apacheignite/v2.4/docs/k-nn-regression-25|http://example.com] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8398) Update documentation for KMeans clustering (release 2.5)
Aleksey Zinoviev created IGNITE-8398: Summary: Update documentation for KMeans clustering (release 2.5) Key: IGNITE-8398 URL: https://issues.apache.org/jira/browse/IGNITE-8398 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have changed a kMeans clustering and remove FuzzyCMeans working on top of partition based dataset and now we need to update documentation for this feature. Previous version: [https://dash.readme.io/project/apacheignite/v2.4/docs/k-means-clustering] update with New version: [https://dash.readme.io/project/apacheignite/v2.4/docs/k-means-clustering-25] IMPORTANT: Remove page [https://dash.readme.io/project/apacheignite/v2.4/docs/fuzzy-c-means-clustering] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8399) Add documentation for kNN classification (release 2.5)
Aleksey Zinoviev created IGNITE-8399: Summary: Add documentation for kNN classification (release 2.5) Key: IGNITE-8399 URL: https://issues.apache.org/jira/browse/IGNITE-8399 Project: Ignite Issue Type: Improvement Components: documentation, ml Affects Versions: 2.5 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In Apache Ignite 2.5 we have added a SVM Binary and Multi-class classification working on top of partition based dataset and now we need to update documentation for this feature. Add page [https://dash.readme.io/project/apacheignite/v2.4/docs/svm-25] Add page [https://dash.readme.io/project/apacheignite/v2.4/docs/svm-multi-class-classification-25] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8403) [ML] Add Binary Logistic Regression based on partitioned datasets and MLP
Aleksey Zinoviev created IGNITE-8403: Summary: [ML] Add Binary Logistic Regression based on partitioned datasets and MLP Key: IGNITE-8403 URL: https://issues.apache.org/jira/browse/IGNITE-8403 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8410) [ML] Unify KNNClassification/KNNRegression Model Trainer .fit() signatures
Aleksey Zinoviev created IGNITE-8410: Summary: [ML] Unify KNNClassification/KNNRegression Model Trainer .fit() signatures Key: IGNITE-8410 URL: https://issues.apache.org/jira/browse/IGNITE-8410 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.6 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Make fit calls similar. Should refactor one of trainers and remove one signature. The possible solution to pass dataCache and ignite separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8450) [ML] Cleanup the ML package: remove unused vector/matrix classes
Aleksey Zinoviev created IGNITE-8450: Summary: [ML] Cleanup the ML package: remove unused vector/matrix classes Key: IGNITE-8450 URL: https://issues.apache.org/jira/browse/IGNITE-8450 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8451) [ML] Refactor Labeled Dataset: remove unused methods and fields
Aleksey Zinoviev created IGNITE-8451: Summary: [ML] Refactor Labeled Dataset: remove unused methods and fields Key: IGNITE-8451 URL: https://issues.apache.org/jira/browse/IGNITE-8451 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8511) [ML] Add support for Multi-Class Logistic Regression
Aleksey Zinoviev created IGNITE-8511: Summary: [ML] Add support for Multi-Class Logistic Regression Key: IGNITE-8511 URL: https://issues.apache.org/jira/browse/IGNITE-8511 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8542) [ML] Add OneVsRest Trainer to handle cases with multiple class labels in dataset
Aleksey Zinoviev created IGNITE-8542: Summary: [ML] Add OneVsRest Trainer to handle cases with multiple class labels in dataset Key: IGNITE-8542 URL: https://issues.apache.org/jira/browse/IGNITE-8542 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8567) [ML] Add Imputer and Binarizer for data preprocessing
Aleksey Zinoviev created IGNITE-8567: Summary: [ML] Add Imputer and Binarizer for data preprocessing Key: IGNITE-8567 URL: https://issues.apache.org/jira/browse/IGNITE-8567 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The imputing with Mean and Most frequent values options can be effectively distributed. [http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html#sklearn.preprocessing.Imputer] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9145) [ML] Add different strategies to index labels in StringEncoderTrainer
Aleksey Zinoviev created IGNITE-9145: Summary: [ML] Add different strategies to index labels in StringEncoderTrainer Key: IGNITE-9145 URL: https://issues.apache.org/jira/browse/IGNITE-9145 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 The main idea to add a few strategies of indexing: sorting and so on. Currently it supports only one strategy (most popular with zero and less popular with the max index size). There are can be a few options * 'frequencyDesc': descending order by label frequency (most frequent label assigned 0) * 'frequencyAsc': ascending order by label frequency (least frequent label assigned 0) * 'alphabetDesc': descending alphabetical order * 'alphabetAsc': ascending alphabetical order Please, update the method **transformFrequenciesToEncodingValues and add the strategy as a parameter of trainer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-6693) private method initBlockFor in BlockMatrixStorage works incorrect for small matricies
Aleksey Zinoviev created IGNITE-6693: Summary: private method initBlockFor in BlockMatrixStorage works incorrect for small matricies Key: IGNITE-6693 URL: https://issues.apache.org/jira/browse/IGNITE-6693 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: ml Reporter: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6761) Change block numeration in SparseBlockMatrixStorage
Aleksey Zinoviev created IGNITE-6761: Summary: Change block numeration in SparseBlockMatrixStorage Key: IGNITE-6761 URL: https://issues.apache.org/jira/browse/IGNITE-6761 Project: Ignite Issue Type: Improvement Security Level: Public (Viewable by anyone) Components: ml Reporter: Aleksey Zinoviev Please, change schema of block numeration from 1-dimension coordinates to 2-dimension coordinates. It helps to avoid complex calculation of blockId and row and column of CacheEntries for block. Now there are a few bugs there. From ||Heading 1||Heading 2|| |0|1| |2|3| |4|5| to ||Heading 1||Heading 2|| |0,0|0,1| |1,0|1,1| |2,1|2,2| -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6805) Tests are red for another MAX_BLOCK_SIZE value (4 or 8 instead 32)
Aleksey Zinoviev created IGNITE-6805: Summary: Tests are red for another MAX_BLOCK_SIZE value (4 or 8 instead 32) Key: IGNITE-6805 URL: https://issues.apache.org/jira/browse/IGNITE-6805 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Affects Versions: 2.4 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev In SparseDistributedBlockMatrixTest are red next tests * testCacheBehaviour * testSquareMatrixTimes with another value of constant MAX_BLOCK_SIZE (4 or 8, for example) In my opinion, it means that algorithm is incorrect for matrices with large number of blocks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6968) Move similar Cache configurations in matrices and models to one Java or XML config
Aleksey Zinoviev created IGNITE-6968: Summary: Move similar Cache configurations in matrices and models to one Java or XML config Key: IGNITE-6968 URL: https://issues.apache.org/jira/browse/IGNITE-6968 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev There are a lot of copy-paste cache configs in matrices and vectors in method newCache() which returns configured cache for different data structures For example * SparseDistributedMatrixStorage * BlockVectorStorage * BlockMatrixStorage * SplitCache * FeatureCache * ProjectionCache * SparseDistributedVectorStorage and others Also, all strategies of cache usage should be documented better (with description of choosing one or another parameter value) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6969) Move constants with influence on performance to separate config
Aleksey Zinoviev created IGNITE-6969: Summary: Move constants with influence on performance to separate config Key: IGNITE-6969 URL: https://issues.apache.org/jira/browse/IGNITE-6969 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Priority: Minor Move constants like BLOCK_SIZE in block matrix and block vector to a separate config. Also a few constants in Decision Trees can be placed there. Motivation: Developer can tune this parameters to increase throughput. Comment: We need more detailed review to find other constants which can be changed or override by developers. Please add them in comments -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7022) Use QuadTree for kNN performance
Aleksey Zinoviev created IGNITE-7022: Summary: Use QuadTree for kNN performance Key: IGNITE-7022 URL: https://issues.apache.org/jira/browse/IGNITE-7022 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Priority: Minor Now, kNN implementation is not too fast. Its performance could be increased with [https://en.wikipedia.org/wiki/Quadtree] Also, benchmarks should be provided too -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7025) Implement different strategies to fill missed data in LabeledDataset during loading from file
Aleksey Zinoviev created IGNITE-7025: Summary: Implement different strategies to fill missed data in LabeledDataset during loading from file Key: IGNITE-7025 URL: https://issues.apache.org/jira/browse/IGNITE-7025 Project: Ignite Issue Type: Task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Priority: Minor For example, it can be four strategies public enum FillMissingValueWith { /** * Fill missed value with zero or empty string or default value for categorical features */ ZERO, /** * Fill missed value with mean on column * Requires an additional time to calculate */ MEAN, /** * Fill missed value with mode on column * Requires an additional time to calculate */ MODE, /** * Deletes observation with missed values * Transforms dataset and changes indexing */ DELETE } -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7079) Add examples for kNN classification and for kNN regression
Aleksey Zinoviev created IGNITE-7079: Summary: Add examples for kNN classification and for kNN regression Key: IGNITE-7079 URL: https://issues.apache.org/jira/browse/IGNITE-7079 Project: Ignite Issue Type: Task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Should contain 4 examples for weighted/simple versions for both algorithms Also it should contain Normalization usage -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7316) Make Linear SVM for binary classification
Aleksey Zinoviev created IGNITE-7316: Summary: Make Linear SVM for binary classification Key: IGNITE-7316 URL: https://issues.apache.org/jira/browse/IGNITE-7316 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev It should contain # dataset for tests # loss function # binary classification metric (ROC AUC, for example) # Common SVM model # SVM Linear BInary Trainer -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7317) Make SVM Linear example for binary classification
Aleksey Zinoviev created IGNITE-7317: Summary: Make SVM Linear example for binary classification Key: IGNITE-7317 URL: https://issues.apache.org/jira/browse/IGNITE-7317 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Play with params and different datasets in example Optional: it could be compared with kNN classification method -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7325) Remove ExternalizeTest from all test and change on ExternalizableTest
Aleksey Zinoviev created IGNITE-7325: Summary: Remove ExternalizeTest from all test and change on ExternalizableTest Key: IGNITE-7325 URL: https://issues.apache.org/jira/browse/IGNITE-7325 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7327) Add CSV loading to Labeled Dataset with Loader
Aleksey Zinoviev created IGNITE-7327: Summary: Add CSV loading to Labeled Dataset with Loader Key: IGNITE-7327 URL: https://issues.apache.org/jira/browse/IGNITE-7327 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Comment from [~dmitrievanthony] Lots of datasets (from Kaggle for example) are supplied in CSV format with header line. In connection with it does it make sense to: Use some CSV parsing (it's a bit more complicated than just splitting by comma)? Add ability to use first header line as a source for so called feature names? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7328) Improve Labeled Dataset loading from txt file
Aleksey Zinoviev created IGNITE-7328: Summary: Improve Labeled Dataset loading from txt file Key: IGNITE-7328 URL: https://issues.apache.org/jira/browse/IGNITE-7328 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev 1. Wouldn't it be better to parse rows in-place (not to save them as strings at first)? In current implementation we will be needed to keep a dataset in memory twice and it might be a problem for big datasets. 2. What about the case when a dataset contains not only a numerical data? Do we consider this case or for such purposes some other "DatasetLoader" will be used? 3. Just an idea, in case we don't want to fall on bad data (99% of cases) would be great to understand the quality of loaded dataset such as number of missed rows/values. 4. Does a situation when a row doesn't contain required number of columns should be considered as "bad data" and don't break parsing with IndexOutOfBoundException? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7451) Make Linear SVM for multi-classification
Aleksey Zinoviev created IGNITE-7451: Summary: Make Linear SVM for multi-classification Key: IGNITE-7451 URL: https://issues.apache.org/jira/browse/IGNITE-7451 Project: Ignite Issue Type: Sub-task Components: ml Environment: Compare and choose one of approaches _one-against-one or one-against-the rest_ Read the paper [https://www.csie.ntu.edu.tw/~cjlin/papers/multisvm.pdf] Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7452) Make Linear SVM example for multi - classification
Aleksey Zinoviev created IGNITE-7452: Summary: Make Linear SVM example for multi - classification Key: IGNITE-7452 URL: https://issues.apache.org/jira/browse/IGNITE-7452 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Add an example for Iris dataset (and compare with kNN) for example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7532) kNN Documentation
Aleksey Zinoviev created IGNITE-7532: Summary: kNN Documentation Key: IGNITE-7532 URL: https://issues.apache.org/jira/browse/IGNITE-7532 Project: Ignite Issue Type: Task Components: documentation Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7702) Adopt KNN classification to the new Dataset from dataset package
Aleksey Zinoviev created IGNITE-7702: Summary: Adopt KNN classification to the new Dataset from dataset package Key: IGNITE-7702 URL: https://issues.apache.org/jira/browse/IGNITE-7702 Project: Ignite Issue Type: Improvement Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7796) Adopt kNN classification example to the new datasets
Aleksey Zinoviev created IGNITE-7796: Summary: Adopt kNN classification example to the new datasets Key: IGNITE-7796 URL: https://issues.apache.org/jira/browse/IGNITE-7796 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7797) Adopt yardstick tests for the new version of kNN classification algorithm
Aleksey Zinoviev created IGNITE-7797: Summary: Adopt yardstick tests for the new version of kNN classification algorithm Key: IGNITE-7797 URL: https://issues.apache.org/jira/browse/IGNITE-7797 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-12148) [ML] Recommendation Engine
Aleksey Zinoviev created IGNITE-12148: - Summary: [ML] Recommendation Engine Key: IGNITE-12148 URL: https://issues.apache.org/jira/browse/IGNITE-12148 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 The main idea - the provide the recommendation engine to build the recommendation system over the Ignite cache and via SQL operators -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12168) [ML] Flaky ML example tests
Aleksey Zinoviev created IGNITE-12168: - Summary: [ML] Flaky ML example tests Key: IGNITE-12168 URL: https://issues.apache.org/jira/browse/IGNITE-12168 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Discussed here [http://apache-ignite-developers.2346864.n4.nabble.com/After-IGNITE-12148-the-Examples-suite-has-unstable-tests-td43469.html] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12180) [ML] Add support of the next Imputing Strategies: MIN, MAX
Aleksey Zinoviev created IGNITE-12180: - Summary: [ML] Add support of the next Imputing Strategies: MIN, MAX Key: IGNITE-12180 URL: https://issues.apache.org/jira/browse/IGNITE-12180 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add support of the next Imputing Strategies: MIN, MAX -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12216) [ML][Umbrella]
Aleksey Zinoviev created IGNITE-12216: - Summary: [ML][Umbrella] Key: IGNITE-12216 URL: https://issues.apache.org/jira/browse/IGNITE-12216 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Discussion here [http://apache-ignite-developers.2346864.n4.nabble.com/ML-DISCUSSION-Big-Double-problem-td42262.html#a42267] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12217) [ML] Add support for label encoding
Aleksey Zinoviev created IGNITE-12217: - Summary: [ML] Add support for label encoding Key: IGNITE-12217 URL: https://issues.apache.org/jira/browse/IGNITE-12217 Project: Ignite Issue Type: Sub-task Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Support handling of training on Mushroom dataset See part of the discussion: "My dataset is Mushrooms <[https://www.kaggle.com/uciml/mushroom-classification]> dataset from Kaggle. There are only categorial features and categorical labels." -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12218) [ML] Add support for Strings in Vectorizer
Aleksey Zinoviev created IGNITE-12218: - Summary: [ML] Add support for Strings in Vectorizer Key: IGNITE-12218 URL: https://issues.apache.org/jira/browse/IGNITE-12218 Project: Ignite Issue Type: Sub-task Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Currently the signatures of vectorizers are limited, should extend for Strings support -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-10528) [ML] Fix incorrect comparing of double values in ML examples
Aleksey Zinoviev created IGNITE-10528: - Summary: [ML] Fix incorrect comparing of double values in ML examples Key: IGNITE-10528 URL: https://issues.apache.org/jira/browse/IGNITE-10528 Project: Ignite Issue Type: Bug Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Look at code row if (groundTruth != prediction) in each example Fix with Math.abs or Double.compare method (don't forget precision) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10529) [ML] Add Confusion Matrix support for classification algorithms
Aleksey Zinoviev created IGNITE-10529: - Summary: [ML] Add Confusion Matrix support for classification algorithms Key: IGNITE-10529 URL: https://issues.apache.org/jira/browse/IGNITE-10529 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This is an umbrella ticket for Confusion Matrix Support -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10530) [ML] Add Confusion Matrix for Binary Classification
Aleksey Zinoviev created IGNITE-10530: - Summary: [ML] Add Confusion Matrix for Binary Classification Key: IGNITE-10530 URL: https://issues.apache.org/jira/browse/IGNITE-10530 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add special class to build confusion matrix as a product of evaluation process -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10531) [ML] Refactor all examples to use Binary Confusion Matrix instead of calculations by hand
Aleksey Zinoviev created IGNITE-10531: - Summary: [ML] Refactor all examples to use Binary Confusion Matrix instead of calculations by hand Key: IGNITE-10531 URL: https://issues.apache.org/jira/browse/IGNITE-10531 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Change // Build confusion matrix. See https://en.wikipedia.org/wiki/Confusion_matrix int[][] confusionMtx = \{{0, 0}, \{0, 0}}; to usage of ConfusionMatrix -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10532) [ML] Add Confusion Matrix for multi-class classification
Aleksey Zinoviev created IGNITE-10532: - Summary: [ML] Add Confusion Matrix for multi-class classification Key: IGNITE-10532 URL: https://issues.apache.org/jira/browse/IGNITE-10532 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Explore ability to integrate the OneVsRest with ConfusionMatrix calculation also it can be implemented only after MultiClassEvaluator (no ticket yet) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10605) [ML] Add multiple metrics calculations to Cross-Validation
Aleksey Zinoviev created IGNITE-10605: - Summary: [ML] Add multiple metrics calculations to Cross-Validation Key: IGNITE-10605 URL: https://issues.apache.org/jira/browse/IGNITE-10605 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Extend and refactor CrossValidation class methods with scoreCalculator parameter. Refactor tests and examples and tutorial according new changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10606) [ML] Add tests for Evaluator
Aleksey Zinoviev created IGNITE-10606: - Summary: [ML] Add tests for Evaluator Key: IGNITE-10606 URL: https://issues.apache.org/jira/browse/IGNITE-10606 Project: Ignite Issue Type: Task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Cover the Evaluator static methods by tests. It should be simple tests smaller than Evaluator example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10697) [ML] Add Frequency Encoding
Aleksey Zinoviev created IGNITE-10697: - Summary: [ML] Add Frequency Encoding Key: IGNITE-10697 URL: https://issues.apache.org/jira/browse/IGNITE-10697 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Encode the values to a fraction of all the labels. Can work with linear models if the frequency is correlated with the target value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10711) [ML] [Umbrella] Provide metrics to evaluate the quality of model
Aleksey Zinoviev created IGNITE-10711: - Summary: [ML] [Umbrella] Provide metrics to evaluate the quality of model Key: IGNITE-10711 URL: https://issues.apache.org/jira/browse/IGNITE-10711 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This is an umbrella ticket for all metric-related tickets -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10713) [ML] Refactor examples with accuracy calculation and another metrics usage
Aleksey Zinoviev created IGNITE-10713: - Summary: [ML] Refactor examples with accuracy calculation and another metrics usage Key: IGNITE-10713 URL: https://issues.apache.org/jira/browse/IGNITE-10713 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Avoid manual calculation of accuracy, use evaluator instead of counters in examples -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10792) [ML] Add seed to test-train filter
Aleksey Zinoviev created IGNITE-10792: - Summary: [ML] Add seed to test-train filter Key: IGNITE-10792 URL: https://issues.apache.org/jira/browse/IGNITE-10792 Project: Ignite Issue Type: Task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Need to reproduce results from test to test in second Evaluator test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10803) [ML] Add prototype LinearRegression loading from PMML format
Aleksey Zinoviev created IGNITE-10803: - Summary: [ML] Add prototype LinearRegression loading from PMML format Key: IGNITE-10803 URL: https://issues.apache.org/jira/browse/IGNITE-10803 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Generate or get existing PMML model for known dataset to load and predict new data in Ignite -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10804) [ML] Add ability to load LinReg model from Spark to Ignite via PMML
Aleksey Zinoviev created IGNITE-10804: - Summary: [ML] Add ability to load LinReg model from Spark to Ignite via PMML Key: IGNITE-10804 URL: https://issues.apache.org/jira/browse/IGNITE-10804 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev 1) Write simple ML pipeline for Spark 2) Convert to PMML model 3) Load to Ignite 4) Predict on Ignite -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10865) [ML] [Umbrella] Integration with Spark ML
Aleksey Zinoviev created IGNITE-10865: - Summary: [ML] [Umbrella] Integration with Spark ML Key: IGNITE-10865 URL: https://issues.apache.org/jira/browse/IGNITE-10865 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Investigate how to load ML models from Spark -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10866) [ML] Add an example of LogRegression model loading
Aleksey Zinoviev created IGNITE-10866: - Summary: [ML] Add an example of LogRegression model loading Key: IGNITE-10866 URL: https://issues.apache.org/jira/browse/IGNITE-10866 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Load the LogReg model from Spark via Spark ML Writable to parquet file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10869) [ML] Add MultiClass classification metrics
Aleksey Zinoviev created IGNITE-10869: - Summary: [ML] Add MultiClass classification metrics Key: IGNITE-10869 URL: https://issues.apache.org/jira/browse/IGNITE-10869 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add ability to calculate multiple metrics (as binary metrics) for multiclass classification It can be merged with OneVsRest approach -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10870) [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset
Aleksey Zinoviev created IGNITE-10870: - Summary: [ML] Add an example for KNN/LogReg and multi-class task full Iris dataset Key: IGNITE-10870 URL: https://issues.apache.org/jira/browse/IGNITE-10870 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add a one or two examples for KNN/LogReg and Iris dataset with 3 classes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10901) [ML][Umbrella] Add support of regression metrics to evaluate regression
Aleksey Zinoviev created IGNITE-10901: - Summary: [ML][Umbrella] Add support of regression metrics to evaluate regression Key: IGNITE-10901 URL: https://issues.apache.org/jira/browse/IGNITE-10901 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Look at scikit-learn metrics like |*Regression*| | | |‘explained_variance’|[{{metrics.explained_variance_score}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score]| | |‘neg_mean_absolute_error’|[{{metrics.mean_absolute_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error]| | |‘neg_mean_squared_error’|[{{metrics.mean_squared_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error]| | |‘neg_mean_squared_log_error’|[{{metrics.mean_squared_log_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_log_error.html#sklearn.metrics.mean_squared_log_error]| | |‘neg_median_absolute_error’|[{{metrics.median_absolute_error}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.median_absolute_error.html#sklearn.metrics.median_absolute_error]| | |‘r2’|[{{metrics.r2_score}}|https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score]| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10902) [ML] Implement a few regression metrics in one RegressionMetrics class
Aleksey Zinoviev created IGNITE-10902: - Summary: [ML] Implement a few regression metrics in one RegressionMetrics class Key: IGNITE-10902 URL: https://issues.apache.org/jira/browse/IGNITE-10902 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Look for possible metrics in Spark, Smile, Scikit-learn -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10903) [ML] Provide an example with training of regression model and its evaluation
Aleksey Zinoviev created IGNITE-10903: - Summary: [ML] Provide an example with training of regression model and its evaluation Key: IGNITE-10903 URL: https://issues.apache.org/jira/browse/IGNITE-10903 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 It could be parametric or non-parametric regression -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10904) [ML] Refactor all examples with regression to use RegressionMetrics
Aleksey Zinoviev created IGNITE-10904: - Summary: [ML] Refactor all examples with regression to use RegressionMetrics Key: IGNITE-10904 URL: https://issues.apache.org/jira/browse/IGNITE-10904 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Look for all regression examples and add as a final step the RegressionMetrics usage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10968) [ML] Create new ignite module SparkMLModelImport and add LogRegression converter
Aleksey Zinoviev created IGNITE-10968: - Summary: [ML] Create new ignite module SparkMLModelImport and add LogRegression converter Key: IGNITE-10968 URL: https://issues.apache.org/jira/browse/IGNITE-10968 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev * Create new module * Add specific dependencies (ml/hadoop/spark/parquet) * Move LogRegression example to this module -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11000) [ML] Add parser for Spark LinearRegression
Aleksey Zinoviev created IGNITE-11000: - Summary: [ML] Add parser for Spark LinearRegression Key: IGNITE-11000 URL: https://issues.apache.org/jira/browse/IGNITE-11000 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing LinearRegression model # Save model to parquet file # Parse parquet file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11001) [ML] Add parser for Spark Linear SVM model
Aleksey Zinoviev created IGNITE-11001: - Summary: [ML] Add parser for Spark Linear SVM model Key: IGNITE-11001 URL: https://issues.apache.org/jira/browse/IGNITE-11001 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Linear SVM model # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11002) [ML] Add parser for Spark Decision tree classifier model
Aleksey Zinoviev created IGNITE-11002: - Summary: [ML] Add parser for Spark Decision tree classifier model Key: IGNITE-11002 URL: https://issues.apache.org/jira/browse/IGNITE-11002 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Linear SVM model # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11003) [ML] Add parser for Spark Random forest classifier
Aleksey Zinoviev created IGNITE-11003: - Summary: [ML] Add parser for Spark Random forest classifier Key: IGNITE-11003 URL: https://issues.apache.org/jira/browse/IGNITE-11003 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11005) [ML] Add parser for Spark Gradient-boosted tree classifier
Aleksey Zinoviev created IGNITE-11005: - Summary: [ML] Add parser for Spark Gradient-boosted tree classifier Key: IGNITE-11005 URL: https://issues.apache.org/jira/browse/IGNITE-11005 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev # Write Spark example producing Gradient-boosted tree classifier model # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11012) [ML] Add model type validation during parsing parquet file
Aleksey Zinoviev created IGNITE-11012: - Summary: [ML] Add model type validation during parsing parquet file Key: IGNITE-11012 URL: https://issues.apache.org/jira/browse/IGNITE-11012 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 After resolving ignite path, check special field in parquet file to validate apropriate model loading. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11037) [ML] Add parser for Spark KMeans clustering model
Aleksey Zinoviev created IGNITE-11037: - Summary: [ML] Add parser for Spark KMeans clustering model Key: IGNITE-11037 URL: https://issues.apache.org/jira/browse/IGNITE-11037 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11039) [ML] Add parser for Spark Decision tree regression
Aleksey Zinoviev created IGNITE-11039: - Summary: [ML] Add parser for Spark Decision tree regression Key: IGNITE-11039 URL: https://issues.apache.org/jira/browse/IGNITE-11039 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Decision Tree Regressor # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11040) [ML] Add parser for Spark Random forest regressor
Aleksey Zinoviev created IGNITE-11040: - Summary: [ML] Add parser for Spark Random forest regressor Key: IGNITE-11040 URL: https://issues.apache.org/jira/browse/IGNITE-11040 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Random Forest regressor # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11041) [ML] Add parser for Spark Gradient-boosted tree regressor
Aleksey Zinoviev created IGNITE-11041: - Summary: [ML] Add parser for Spark Gradient-boosted tree regressor Key: IGNITE-11041 URL: https://issues.apache.org/jira/browse/IGNITE-11041 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 # Write Spark example producing Gradient-boosted tree regressor # Save model to parquet file # Parse parquet file # Add an example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11244) [ML] Improve model loading from directory instead full path to file with model
Aleksey Zinoviev created IGNITE-11244: - Summary: [ML] Improve model loading from directory instead full path to file with model Key: IGNITE-11244 URL: https://issues.apache.org/jira/browse/IGNITE-11244 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The proposed feature should support auto-discovering of Spark models in the suggested directories -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11294) [ML] Use ML logger and env variables in Spark ML Parser
Aleksey Zinoviev created IGNITE-11294: - Summary: [ML] Use ML logger and env variables in Spark ML Parser Key: IGNITE-11294 URL: https://issues.apache.org/jira/browse/IGNITE-11294 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Add logger to SparkModelParser class and environment usage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11295) [ML] Add readme file to SparkModelParser module
Aleksey Zinoviev created IGNITE-11295: - Summary: [ML] Add readme file to SparkModelParser module Key: IGNITE-11295 URL: https://issues.apache.org/jira/browse/IGNITE-11295 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 This file should contain examples of usage and instruction how to use this module -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11680) [ML] Improve ROC AUC to work with ProbableLabel
Aleksey Zinoviev created IGNITE-11680: - Summary: [ML] Improve ROC AUC to work with ProbableLabel Key: IGNITE-11680 URL: https://issues.apache.org/jira/browse/IGNITE-11680 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The ROC AUC implementation is ready to work with Probable label instead of binary label (0.0/1.0) It should work in future for multi-classification tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-12079) [ML][Umbrella] Add advanced preprocessing techniques
Aleksey Zinoviev created IGNITE-12079: - Summary: [ML][Umbrella] Add advanced preprocessing techniques Key: IGNITE-12079 URL: https://issues.apache.org/jira/browse/IGNITE-12079 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 *Main goal:* To reduce the gap between Apache Spark and Apache Ignite in preprocessing operations. The reducing of the gap could help with loading Spark ML Pipelines to Ignite ML. Next steps: # Add Frequency Encoder # Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, LEAST_FREQUENT) # Add RobustScaler (will be added in Spark 3.0) # Add CountVectorizer # Add FeatureHasher # Add QuantileDiscretizer # Add Locality Sensitive Hashing (LSH) # Add LabelEncoder # Add RevertStringIndexing # Add multi-column preprocessor -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-9239) [ML] KMeansTrainer crashed if amount of possible clusters more than amount of partitions in dataset
Aleksey Zinoviev created IGNITE-9239: Summary: [ML] KMeansTrainer crashed if amount of possible clusters more than amount of partitions in dataset Key: IGNITE-9239 URL: https://issues.apache.org/jira/browse/IGNITE-9239 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev How to reproduce? Set the K parameter in KMeans Trainer to 100, and run KMeansClusterization Example \ StackTrace is Exception in thread "KMeansClusterizationExample-#44" java.lang.RuntimeException: java.lang.IllegalArgumentException: bound must be positive at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:112) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:46) at org.apache.ignite.ml.trainers.DatasetTrainer.fit(DatasetTrainer.java:68) at org.apache.ignite.examples.ml.clustering.KMeansClusterizationExample.lambda$main$0(KMeansClusterizationExample.java:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:193) at org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86) ... 4 more The possible solution : correct the mechanism of rndPnts computation in the row 180-190 in KMeansTrainer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9261) [ML] Add ANN algorithm based on ACD concept
Aleksey Zinoviev created IGNITE-9261: Summary: [ML] Add ANN algorithm based on ACD concept Key: IGNITE-9261 URL: https://issues.apache.org/jira/browse/IGNITE-9261 Project: Ignite Issue Type: New Feature Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev The ACD concept is implemented via centroids searching with KMeans help. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9281) [ML] Starter ML tasks
Aleksey Zinoviev created IGNITE-9281: Summary: [ML] Starter ML tasks Key: IGNITE-9281 URL: https://issues.apache.org/jira/browse/IGNITE-9281 Project: Ignite Issue Type: Wish Components: ml Reporter: Aleksey Zinoviev Fix For: None This ticket is an umbrella ticket for ML starter tasks. Please, contact [~zaleslaw] to assign and get help with one of this tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9282) [ML] Add Naive Bayes classifier
Aleksey Zinoviev created IGNITE-9282: Summary: [ML] Add Naive Bayes classifier Key: IGNITE-9282 URL: https://issues.apache.org/jira/browse/IGNITE-9282 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. So we want to add this algorithm to Apache Ignite ML module. Ideally, implementation should support both multinomial naive Bayes and Bernoulli naive Bayes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor
Aleksey Zinoviev created IGNITE-9283: Summary: [ML] Add Discrete Cosine preprocessor Key: IGNITE-9283 URL: https://issues.apache.org/jira/browse/IGNITE-9283 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform] Please look at the MinMaxScaler or Normalization packages in preprocessing package. Add classes if required 1) Preprocessor 2) Trainer 3) custom PartitionData if shuffling is a step of algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9284) [ML] Add a Standard Scaler
Aleksey Zinoviev created IGNITE-9284: Summary: [ML] Add a Standard Scaler Key: IGNITE-9284 URL: https://issues.apache.org/jira/browse/IGNITE-9284 Project: Ignite Issue Type: Sub-task Reporter: Aleksey Zinoviev Add analogue of [http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html] Please look at the MinMaxScaler or Normalization packages in preprocessing package. Add classes if required 1) Preprocessor 2) Trainer 3) custom PartitionData if shuffling is a step of algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9285) [ML] Add MaxAbsScaler as a preprocessing stage
Aleksey Zinoviev created IGNITE-9285: Summary: [ML] Add MaxAbsScaler as a preprocessing stage Key: IGNITE-9285 URL: https://issues.apache.org/jira/browse/IGNITE-9285 Project: Ignite Issue Type: Sub-task Components: ml Reporter: Aleksey Zinoviev Add analogue of [http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler] Please look at the MinMaxScaler or Normalization packages in preprocessing package. Add classes if required 1) Preprocessor 2) Trainer 3) custom PartitionData if shuffling is a step of algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9336) [ML] ANN/SVM Trainer tests produce unpredictable results due to random data generation
Aleksey Zinoviev created IGNITE-9336: Summary: [ML] ANN/SVM Trainer tests produce unpredictable results due to random data generation Key: IGNITE-9336 URL: https://issues.apache.org/jira/browse/IGNITE-9336 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Remove random data generation and add static dataset into tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9393) [ML] KMeans fails on complex data in cache
Aleksey Zinoviev created IGNITE-9393: Summary: [ML] KMeans fails on complex data in cache Key: IGNITE-9393 URL: https://issues.apache.org/jira/browse/IGNITE-9393 Project: Ignite Issue Type: Bug Components: ml Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Described here http://apache-ignite-users.70518.x6.nabble.com/NPE-exception-in-KMeansTrainer-td23504.html#a23512 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9463) [ML] Update ML tutorial with new model composition/update features
Aleksey Zinoviev created IGNITE-9463: Summary: [ML] Update ML tutorial with new model composition/update features Key: IGNITE-9463 URL: https://issues.apache.org/jira/browse/IGNITE-9463 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9482) [ML] Refactor all trainers' settters to withFieldName format for meta-algorithms
Aleksey Zinoviev created IGNITE-9482: Summary: [ML] Refactor all trainers' settters to withFieldName format for meta-algorithms Key: IGNITE-9482 URL: https://issues.apache.org/jira/browse/IGNITE-9482 Project: Ignite Issue Type: Sub-task Components: ml Affects Versions: 2.7 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9497) [ML] Add Pipeline support to Cross-Validation process
Aleksey Zinoviev created IGNITE-9497: Summary: [ML] Add Pipeline support to Cross-Validation process Key: IGNITE-9497 URL: https://issues.apache.org/jira/browse/IGNITE-9497 Project: Ignite Issue Type: New Feature Components: ml Affects Versions: 2.8 Reporter: Aleksey Zinoviev Assignee: Aleksey Zinoviev Fix For: 2.8 Change API of ParamGrid.addHyperParam to support meta-information about Pipeline Stage Add to Cross-Validation method to support evaluate the whole Pipeline Process and inject hyper-parameters from the ParamGrid -- This message was sent by Atlassian JIRA (v7.6.3#76005)