RE: Aggregation problem.
Hi Nico; I found the problem. I am also using xgboost. Its library has old version of flink. I removed xgboost's jar with depencies library . Thank you for your interest. Regards, Kursat. -Original Message- From: Nico Kruber [mailto:n...@data-artisans.com] Sent: Thursday, April 13, 2017 5:07 PM To: Kürşat Kurt Cc: user@flink.apache.org Subject: Re: Aggregation problem. I failed to reproduce your error. How did you set up your project: SBT, Maven? Maybe its dependency management is referring to an old version of flink? Maybe different versions of scala are mixed? In that case, you may try setting up a new project: https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/ scala_api_quickstart.html When do you get the error? During compilation in eclipse? After submitting the job to flink? Nico On Wednesday, 12 April 2017 01:15:37 CEST Kürşat Kurt wrote: > I have downloaded latest binary > (http://www.apache.org/dyn/closer.lua/flink/flink-1.2.0/flink-1.2.0-bi > n-had oop27-scala_2.11.tgz). I am getting this error in eclipse > Neon(3) > > Regards, > Kursat > > -Original Message- > From: Nico Kruber [mailto:n...@data-artisans.com] > Sent: Tuesday, April 11, 2017 3:34 PM > To: user@flink.apache.org > Cc: Kürşat Kurt > Subject: Re: Aggregation problem. > > maxBy() is still a member of org.apache.flink.api.scala.GroupedDataSet > in the current sources - what did you upgrade flink to? > > Also please make sure the new version is used, or - if compiled from > sources > - try a "mvn clean install" to get rid of old intermediate files. > > > Regards > Nico > > On Sunday, 9 April 2017 00:38:23 CEST Kürşat Kurt wrote: > > Hi; > > > > > > > > I have just upgraded flink and cant use maxBy on grouped dataset. > > > > I am getting the error below. > > > > > > > > value maxBy is not a member of > > org.apache.flink.api.scala.GroupedDataSet > > > > > > > > > > > > > > > > From: Kürşat Kurt [mailto:kur...@kursatkurt.com] > > Sent: Sunday, February 19, 2017 1:28 AM > > To: user@flink.apache.org > > Subject: RE: Aggregation problem. > > > > > > > > Yes, it works. > > > > Thank you Yassine. > > > > > > > > From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com] > > Sent: Saturday, February 18, 2017 2:48 PM > > To: user@flink.apache.org <mailto:user@flink.apache.org> > > Subject: RE: Aggregation problem. > > > > > > > > Hi, > > > > > > > > I think this is an expected output and not necessarily a bug. To get > > the element having the maximum value, maxBy() should be used instead of > > max(). > > > > > > > > See this answer for more details : > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > > Wrong-> > a nd-non-consistent-behavior-of-max-tp484p488.html > > > > > > > > Best, > > > > Yassine > > > > > > > > On Feb 18, 2017 12:28, "Kürşat Kurt" > <mailto:kur...@kursatkurt.com> > wrote: > > > > Ok, i have opened the issue with the test case. > > > > Thanks. > > > > > > > > https://issues.apache.org/jira/browse/FLINK-5840 > > > > > > > > > > > > From: Fabian Hueske [mailto:fhue...@gmail.com] > > Sent: Saturday, February 18, 2017 3:33 AM > > To: user@flink.apache.org <mailto:user@flink.apache.org> > > Subject: Re: Aggregation problem. > > > > > > > > Hi, > > > > this looks like a bug to me. > > > > Can you open a JIRA and maybe a small testcase to reproduce the issue? > > > > Thank you, > > > > Fabian > > > > > > > > 2017-02-18 1:06 GMT+01:00 Kürşat Kurt > <mailto:kur...@kursatkurt.com> >: > > > > Hi; > > > > > > > > I have a Dataset like this: > > > > > > > > (0,Auto,0.4,1,5.8317538999854194E-5) > > > > (0,Computer,0.2,1,4.8828125E-5) > > > > (0,Sports,0.4,2,1.7495261699956258E-4) > > > > (1,Auto,0.4,1,1.7495261699956258E-4) > > > > (1,Computer,0.2,1,4.8828125E-5) > > > > (1,Sports,0.4,1,5.8317538999854194E-5) > > > > > > > > This code; ds.groupBy(0).max(4).print() prints : > > > > > > > > (0,Sports,0.4,1,1.7495261699956258E-4) > > > > (1,Sports,0.4,1,1.7495261699956258E-4) > > > > > > > > ..but i am expecting > > > > > > > > (0,Sports,0.4,2,1.7495261699956258E-4) > > > > (1,Auto,0.4,1,1.7495261699956258E-4) > > > > > > > > What is wrong with this code?
Index conversion
Hi; I have label index DataSet and final DataSet that i want to convert its indexes to labels. ixDS: DataSet[(Long, String)] (1,Car) (2,Sports) (3,Home) ... finalDS:DataSet[(Long, String, Double, String, Double)] (1,x,1,y,4) (2,z,3,t,5) ... If i want to convert finalDS's indexes with join like this: val res=finalDS.join(ixDS).where(0).equalTo(0){(l,r)=> { (r._2,l._2) }} res.print(); I can not get labed indexes. If i collect and try to match each DS i could get labels : finalDS.collect().foreach(x=>{ ixDS.collect().foreach(y=> { if (x._1==y._1) System.err.println(""+x._2+","+y._2) }) }) .. but i think it is the wrong way. What is the correct way for conversion?
RE: Aggregation problem.
I have downloaded latest binary (http://www.apache.org/dyn/closer.lua/flink/flink-1.2.0/flink-1.2.0-bin-hadoop27-scala_2.11.tgz). I am getting this error in eclipse Neon(3) Regards, Kursat -Original Message- From: Nico Kruber [mailto:n...@data-artisans.com] Sent: Tuesday, April 11, 2017 3:34 PM To: user@flink.apache.org Cc: Kürşat Kurt Subject: Re: Aggregation problem. maxBy() is still a member of org.apache.flink.api.scala.GroupedDataSet in the current sources - what did you upgrade flink to? Also please make sure the new version is used, or - if compiled from sources - try a "mvn clean install" to get rid of old intermediate files. Regards Nico On Sunday, 9 April 2017 00:38:23 CEST Kürşat Kurt wrote: > Hi; > > > > I have just upgraded flink and cant use maxBy on grouped dataset. > > I am getting the error below. > > > > value maxBy is not a member of org.apache.flink.api.scala.GroupedDataSet > > > > > > > > From: Kürşat Kurt [mailto:kur...@kursatkurt.com] > Sent: Sunday, February 19, 2017 1:28 AM > To: user@flink.apache.org > Subject: RE: Aggregation problem. > > > > Yes, it works. > > Thank you Yassine. > > > > From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com] > Sent: Saturday, February 18, 2017 2:48 PM > To: user@flink.apache.org <mailto:user@flink.apache.org> > Subject: RE: Aggregation problem. > > > > Hi, > > > > I think this is an expected output and not necessarily a bug. To get the > element having the maximum value, maxBy() should be used instead of max(). > > > > See this answer for more details : > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-a > nd-non-consistent-behavior-of-max-tp484p488.html > > > > Best, > > Yassine > > > > On Feb 18, 2017 12:28, "Kürşat Kurt" <mailto:kur...@kursatkurt.com> > wrote: > > Ok, i have opened the issue with the test case. > > Thanks. > > > > https://issues.apache.org/jira/browse/FLINK-5840 > > > > > > From: Fabian Hueske [mailto:fhue...@gmail.com] > Sent: Saturday, February 18, 2017 3:33 AM > To: user@flink.apache.org <mailto:user@flink.apache.org> > Subject: Re: Aggregation problem. > > > > Hi, > > this looks like a bug to me. > > Can you open a JIRA and maybe a small testcase to reproduce the issue? > > Thank you, > > Fabian > > > > 2017-02-18 1:06 GMT+01:00 Kürşat Kurt <mailto:kur...@kursatkurt.com> >: > > Hi; > > > > I have a Dataset like this: > > > > (0,Auto,0.4,1,5.8317538999854194E-5) > > (0,Computer,0.2,1,4.8828125E-5) > > (0,Sports,0.4,2,1.7495261699956258E-4) > > (1,Auto,0.4,1,1.7495261699956258E-4) > > (1,Computer,0.2,1,4.8828125E-5) > > (1,Sports,0.4,1,5.8317538999854194E-5) > > > > This code; ds.groupBy(0).max(4).print() prints : > > > > (0,Sports,0.4,1,1.7495261699956258E-4) > > (1,Sports,0.4,1,1.7495261699956258E-4) > > > > ..but i am expecting > > > > (0,Sports,0.4,2,1.7495261699956258E-4) > > (1,Auto,0.4,1,1.7495261699956258E-4) > > > > What is wrong with this code?
RE: Aggregation problem.
Hi; I have just upgraded flink and cant use maxBy on grouped dataset. I am getting the error below. value maxBy is not a member of org.apache.flink.api.scala.GroupedDataSet From: Kürşat Kurt [mailto:kur...@kursatkurt.com] Sent: Sunday, February 19, 2017 1:28 AM To: user@flink.apache.org Subject: RE: Aggregation problem. Yes, it works. Thank you Yassine. From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com] Sent: Saturday, February 18, 2017 2:48 PM To: user@flink.apache.org <mailto:user@flink.apache.org> Subject: RE: Aggregation problem. Hi, I think this is an expected output and not necessarily a bug. To get the element having the maximum value, maxBy() should be used instead of max(). See this answer for more details : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p488.html Best, Yassine On Feb 18, 2017 12:28, "Kürşat Kurt" mailto:kur...@kursatkurt.com> > wrote: Ok, i have opened the issue with the test case. Thanks. https://issues.apache.org/jira/browse/FLINK-5840 From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Saturday, February 18, 2017 3:33 AM To: user@flink.apache.org <mailto:user@flink.apache.org> Subject: Re: Aggregation problem. Hi, this looks like a bug to me. Can you open a JIRA and maybe a small testcase to reproduce the issue? Thank you, Fabian 2017-02-18 1:06 GMT+01:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >: Hi; I have a Dataset like this: (0,Auto,0.4,1,5.8317538999854194E-5) (0,Computer,0.2,1,4.8828125E-5) (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) (1,Computer,0.2,1,4.8828125E-5) (1,Sports,0.4,1,5.8317538999854194E-5) This code; ds.groupBy(0).max(4).print() prints : (0,Sports,0.4,1,1.7495261699956258E-4) (1,Sports,0.4,1,1.7495261699956258E-4) ..but i am expecting (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) What is wrong with this code?
RE: Aggregation problem.
Yes, it works. Thank you Yassine. From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com] Sent: Saturday, February 18, 2017 2:48 PM To: user@flink.apache.org Subject: RE: Aggregation problem. Hi, I think this is an expected output and not necessarily a bug. To get the element having the maximum value, maxBy() should be used instead of max(). See this answer for more details : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p488.html Best, Yassine On Feb 18, 2017 12:28, "Kürşat Kurt" mailto:kur...@kursatkurt.com> > wrote: Ok, i have opened the issue with the test case. Thanks. https://issues.apache.org/jira/browse/FLINK-5840 From: Fabian Hueske [mailto:fhue...@gmail.com <mailto:fhue...@gmail.com> ] Sent: Saturday, February 18, 2017 3:33 AM To: user@flink.apache.org <mailto:user@flink.apache.org> Subject: Re: Aggregation problem. Hi, this looks like a bug to me. Can you open a JIRA and maybe a small testcase to reproduce the issue? Thank you, Fabian 2017-02-18 1:06 GMT+01:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >: Hi; I have a Dataset like this: (0,Auto,0.4,1,5.8317538999854194E-5) (0,Computer,0.2,1,4.8828125E-5) (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) (1,Computer,0.2,1,4.8828125E-5) (1,Sports,0.4,1,5.8317538999854194E-5) This code; ds.groupBy(0).max(4).print() prints : (0,Sports,0.4,1,1.7495261699956258E-4) (1,Sports,0.4,1,1.7495261699956258E-4) ..but i am expecting (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) What is wrong with this code?
RE: Aggregation problem.
Ok, i have opened the issue with the test case. Thanks. https://issues.apache.org/jira/browse/FLINK-5840 From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Saturday, February 18, 2017 3:33 AM To: user@flink.apache.org Subject: Re: Aggregation problem. Hi, this looks like a bug to me. Can you open a JIRA and maybe a small testcase to reproduce the issue? Thank you, Fabian 2017-02-18 1:06 GMT+01:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >: Hi; I have a Dataset like this: (0,Auto,0.4,1,5.8317538999854194E-5) (0,Computer,0.2,1,4.8828125E-5) (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) (1,Computer,0.2,1,4.8828125E-5) (1,Sports,0.4,1,5.8317538999854194E-5) This code; ds.groupBy(0).max(4).print() prints : (0,Sports,0.4,1,1.7495261699956258E-4) (1,Sports,0.4,1,1.7495261699956258E-4) ..but i am expecting (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) What is wrong with this code?
Aggregation problem.
Hi; I have a Dataset like this: (0,Auto,0.4,1,5.8317538999854194E-5) (0,Computer,0.2,1,4.8828125E-5) (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) (1,Computer,0.2,1,4.8828125E-5) (1,Sports,0.4,1,5.8317538999854194E-5) This code; ds.groupBy(0).max(4).print() prints : (0,Sports,0.4,1,1.7495261699956258E-4) (1,Sports,0.4,1,1.7495261699956258E-4) ..but i am expecting (0,Sports,0.4,2,1.7495261699956258E-4) (1,Auto,0.4,1,1.7495261699956258E-4) What is wrong with this code?
Multiclass classification example
Hi; I am trying to learn Flink Ml lib. Where can i find detailed multiclass classification example?
SVM Multiclass classification
Hi; I am trying to classify documents. When i try to predict (same of training set) there is only 1 and -1 predictions. Accuracy is 0%. Can you help me please? val env = ExecutionEnvironment.getExecutionEnvironment val training = Seq( new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0, 1.0, 1.0))), new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9), Array(1.0, 1.0, 1.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))), new LabeledVector(2.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))), new LabeledVector(2.0, new SparseVector(10, Array(0), Array(0.0))), new LabeledVector(1.0, new SparseVector(10, Array(0, 3), Array(1.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0, 2, 3), Array(0.0, 1.0, 1.0))), new LabeledVector(2.0, new SparseVector(10, Array(0, 7, 9), Array(0.0, 1.0))), new LabeledVector(2.0, new SparseVector(10, Array(2,3,4), Array(0.0,1.0,1.0))), new LabeledVector(2.0, new SparseVector(10, Array(0, 3), Array(1.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(2, 3,9), Array(1.0, 0.0, 1.0))) ); val trainingDS = env.fromCollection(training) val testingDS = env.fromCollection(training) val svm = new SVM().setBlocks(env.getParallelism) svm.fit(trainingDS) val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label))) predictions.print(); Sample output: (1.0,1.0) (1.0,1.0) (0.0,1.0) (0.0,-1.0) (2.0,1.0) (2.0,-1.0) (1.0,1.0) (0.0,1.0) (2.0,1.0) (2.0,1.0) (2.0,1.0) (0.0,1.0)
RE: Wordindex conversation.
Ok, thanks Fabian. From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Tuesday, October 11, 2016 1:12 AM To: user@flink.apache.org Subject: Re: Wordindex conversation. Hi, you can do it like this: 1) you have to split each label record of the main dataset into separate records: (0,List(a, b, c, d, e, f, g)) -> (0, a), (0, b), (0, c), ..., (0, g) (1,List(b, c, f, a, g)) -> (1, b), (1, c), ..., (1, g) 2) join word index dataset with splitted main dataset: DataSet> splittedMain = ... DataSet> wordIdx = ... DataSet joined = splittedMain.join(wordIdx).where(1).equalTo(1).with(...) 3) Group by Label: DataSet> labelsWithIdx = joined.groupBy(0).reduceGroup(...) // collect all indexes in list / array Best, Fabian 2016-10-10 23:49 GMT+02:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >: Hi; I have MainDataset (Label,WordList) : (0,List(a, b, c, d, e, f, g)) (1,List(b, c, f, a, g)) ..and, wordIndex dataset(created with .zipWithIndex) : wordIndex> (0,a) wordIndex> (1,b) wordIndex> (2,c) wordIndex> (3,d) wordIndex> (4,e) wordIndex> (5,f) wordIndex> (6,g) How can i convert mainDataset to indexed wordList dataset like this: (0,List(1,2,3,4,5,6)) (1,List(2,3,5,0,6)
Wordindex conversation.
Hi; I have MainDataset (Label,WordList) : (0,List(a, b, c, d, e, f, g)) (1,List(b, c, f, a, g)) ..and, wordIndex dataset(created with .zipWithIndex) : wordIndex> (0,a) wordIndex> (1,b) wordIndex> (2,c) wordIndex> (3,d) wordIndex> (4,e) wordIndex> (5,f) wordIndex> (6,g) How can i convert mainDataset to indexed wordList dataset like this: (0,List(1,2,3,4,5,6)) (1,List(2,3,5,0,6)
SVM classification problem.
Hi; I am trying to train and predict with the same set. I expect that accuracy shuld be %100, am i wrong? If i try to predict with the same set; it is failing, also it classifies like "-1" which is not in the training set. What is wrong with this code? Code: def main(args: Array[String]): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment val training = Seq( new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0, 1.0, 1.0))), new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9), Array(1.0, 1.0, 1.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))), new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0 val trainingDS = env.fromCollection(training) val testingDS = env.fromCollection(training) val svm = new SVM().setBlocks(env.getParallelism) svm.fit(trainingDS) val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label))) predictions.print() } Output: (1.0,1.0) (1.0,1.0) (0.0,1.0) (0.0,-1.0) (0.0,1.0) (0.0,-1.0)