[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-07-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556434#comment-16556434
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

walterddr opened a new pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425
 
 
   ## What is the purpose of the change
   
   * Fix documentation to explicitly specify that +1 and -1 is required when 
using SVM library in flink-ml
   
   ## Brief change log
   
   * Added explicit conversion requirement in document
   
   
   ## Verifying this change
   
   * n/a
   
   ## Does this pull request potentially affect one of the following parts:
   
   * n/a
   
   ## Documentation
   
   * n/a
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563816#comment-16563816
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

yanghua commented on issue #6425: [FLINK-9664][Doc] fixing documentation in ML 
quick start
URL: https://github.com/apache/flink/pull/6425#issuecomment-409258317
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573222#comment-16573222
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208576999
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   I think this section belongs to the beginning of the next one 
`Classification`, because it is about LibSVM format.
   The code example of conversion could be also provided to make the example 
fully 'copy-paste' runnable.
   Small thing is also typo in `traning` -> `training`.
   
   I would suggest to modify the code example in this `LibSVM files` section 
like this:
   ```
   val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
   val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
   ```
   to have no SVM training specifics here, and add something like this to the 
beginning of `Classification` section:
   
   _... After importing the training and test dataset, the data needs to be 
prepared for the classification, because Flink SVM only supports ... conversion 
is needed after downloading ..._
   And then the code example:
   ```
   def svmNormaliser : LabeledVector => LabeledVector =
   lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
   val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(svmNormaliser)
   val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(svmNormaliser).map(x => (x.vector, x.label))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573224#comment-16573224
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208576999
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   I think this section belongs to the beginning of the next one 
`Classification`, because this one is about LibSVM format.
   The code example of conversion could be also provided to make the example 
fully 'copy-paste' runnable.
   Small thing is also typo in `traning` -> `training`.
   
   I would suggest to modify the code example in this `LibSVM files` section 
like this:
   ```
   val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
   val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
   ```
   to have no SVM training specifics here, and add something like this to the 
beginning of `Classification` section:
   
   _... After importing the training and test dataset, the data needs to be 
prepared for the classification, because Flink SVM only supports ... conversion 
is needed after downloading ..._
   And then the code example:
   ```
   def svmNormaliser : LabeledVector => LabeledVector =
   lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
   val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(svmNormaliser)
   val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(svmNormaliser).map(x => (x.vector, x.label))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573225#comment-16573225
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208576999
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   I think this section belongs to the beginning of the next one 
`Classification`, because this one is about LibSVM format.
   The code example of conversion could be also provided to make the example 
fully 'copy-paste' runnable.
   Small thing is also typo in `traning` -> `training`.
   
   I would suggest to modify the code example in this `LibSVM files` section 
like this:
   ```
   val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
   val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
   ```
   to have no SVM training specifics here, and add something like this to the 
beginning of the `Classification` section:
   
   _... After importing the training and test dataset, the data needs to be 
prepared for the classification, because Flink SVM only supports ... conversion 
is needed after downloading ..._
   And then the code example:
   ```
   def svmNormaliser : LabeledVector => LabeledVector =
   lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
   val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(svmNormaliser)
   val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(svmNormaliser).map(x => (x.vector, x.label))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573377#comment-16573377
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

walterddr commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208629086
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   I see. if i understand correctly, summary of the suggestions are:
   1. move the explanation of why -1.0 and +1.0 to classification page. 
   2. leave the quickstart section clean by adding an explicit conversion from 
0,1 to -1,1 so that copy-paste will work. and do not expand too much into why 
-1,+1 is needed, just mentioned that's needed.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574442#comment-16574442
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208837648
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   By sections I mean `LibSVM files` and `Classification` parts of 
`quickstart.md`.
   I think your explanation of why we need the conversion was good, expanded 
enough, I just suggested to rephrase its start a bit to be moved to the 
beginning of `Classification` section. The example of conversion can follow 
your explanation. The overall structure I suggest:
   
   *LibSVM files*
   ...Text as before...
   ```
   ((( leave only lib SVM importing specifics in this example: )))
   val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
   val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
   ```
   ...Text as before..
   
   *Classification*
   ...Explanation of conversion need before classification...:
   ```
   // conversion code example, e.g. which I suggested
   ```
   ...section continues as it was with classification description and its 
example..
   
   The idea is that at the end user can just copy/paste code snippets starting 
from the import code, then conversion/normalisation, then classification etc 
and it eventually works altogether.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577223#comment-16577223
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

walterddr commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r209430609
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   thx for the detail explanation @azagrebin . Sorry for the previous 
confusion. I updated the document, please take another look when you have time 
:-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579588#comment-16579588
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r209894118
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -146,7 +145,23 @@ create a classifier.
 
 ## Classification
 
-Once we have imported the dataset we can train a `Predictor` such as a linear 
SVM classifier.
+After importing the training and test dataset, they need to be prepared for 
the classification. 
+Because Flink SVM only supports threshold binary values of `+1.0` and `-1.0`, 
a conversion is 
+needed after loading the LibSVM dataset since it is labelled using `1`s and 
`0`s.
+
+A conversion can be done using a simple normalizer mapping function:
+ 
+{% highlight scala %}
+
+def normalizer : LabeledVector => LabeledVector = { 
+lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
+}
+val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(normalizer)
+val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(normalizer).map(x => (x.vector, x.label))
+
+{% endhighlight %}
+
+Once we have the converted the dataset we can train a `Predictor` such as a 
linear SVM classifier.
 
 Review comment:
   One superfluous `the`:
   Once we have the converted **the** dataset


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579589#comment-16579589
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r209897278
 
 

 ##
 File path: docs/dev/libs/ml/quickstart.md
 ##
 @@ -146,7 +145,23 @@ create a classifier.
 
 ## Classification
 
-Once we have imported the dataset we can train a `Predictor` such as a linear 
SVM classifier.
+After importing the training and test dataset, they need to be prepared for 
the classification. 
+Because Flink SVM only supports threshold binary values of `+1.0` and `-1.0`, 
a conversion is 
+needed after loading the LibSVM dataset since it is labelled using `1`s and 
`0`s.
 
 Review comment:
   Just a wording thing, I would swap `Because` and `since` in this sentence:
   **Since** Flink SVM only supports threshold binary values of `+1.0` and 
`-1.0`, a conversion is 
   needed after loading the LibSVM dataset **because** it is labelled using 
`1`s and `0`s.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9664) FlinkML Quickstart Loading Data section example doesn't work as described

2018-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579596#comment-16579596
 ] 

ASF GitHub Bot commented on FLINK-9664:
---

zentol closed pull request #6425: [FLINK-9664][Doc] fixing documentation in ML 
quick start
URL: https://github.com/apache/flink/pull/6425
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/dev/libs/ml/quickstart.md b/docs/dev/libs/ml/quickstart.md
index ea6f8049755..e056b28b505 100644
--- a/docs/dev/libs/ml/quickstart.md
+++ b/docs/dev/libs/ml/quickstart.md
@@ -129,15 +129,14 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
-We can simply import the dataset then using:
+We can simply import the dataset using:
 
 {% highlight scala %}
 
 import org.apache.flink.ml.MLUtils
 
-val astroTrain: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
-val astroTest: DataSet[(Vector, Double)] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
-  .map(x => (x.vector, x.label))
+val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
+val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
 
 {% endhighlight %}
 
@@ -146,7 +145,23 @@ create a classifier.
 
 ## Classification
 
-Once we have imported the dataset we can train a `Predictor` such as a linear 
SVM classifier.
+After importing the training and test dataset, they need to be prepared for 
the classification. 
+Since Flink SVM only supports threshold binary values of `+1.0` and `-1.0`, a 
conversion is 
+needed after loading the LibSVM dataset because it is labelled using `1`s and 
`0`s.
+
+A conversion can be done using a simple normalizer mapping function:
+ 
+{% highlight scala %}
+
+def normalizer : LabeledVector => LabeledVector = { 
+lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
+}
+val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(normalizer)
+val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(normalizer).map(x => (x.vector, x.label))
+
+{% endhighlight %}
+
+Once we have converted the dataset we can train a `Predictor` such as a linear 
SVM classifier.
 We can set a number of parameters for the classifier. Here we set the `Blocks` 
parameter,
 which is used to split the input by the underlying CoCoA algorithm 
[[2]](#jaggi) uses. The
 regularization parameter determines the amount of $l_2$ regularization 
applied, which is used


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FlinkML Quickstart Loading Data section example doesn't work as described
> -
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Machine Learning Library
>Affects Versions: 1.5.0
>Reporter: Mano Swerts
>Assignee: Rong Rong
>Priority: Major
>  Labels: documentation-update, machine_learning, ml, 
> pull-request-available
> Fix For: 1.7.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The ML documentation example isn't complete: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification 
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't 
> produce correct results. The SVM predictor expects -1 and 1 labels to 
> correctly predict the label. The documentation, however, doesn't mention 
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1 
> labels and a mapping function that shows the conversion of the labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)