RE: Aggregation problem.

2017-04-17 Thread Kürşat Kurt
Hi Nico;

I found the problem. I am also using xgboost. Its library has old version of 
flink.
I removed xgboost's jar with depencies library .
Thank you for your interest.

Regards,
Kursat.

-Original Message-
From: Nico Kruber [mailto:n...@data-artisans.com] 
Sent: Thursday, April 13, 2017 5:07 PM
To: Kürşat Kurt 
Cc: user@flink.apache.org
Subject: Re: Aggregation problem.

I failed to reproduce your error.

How did you set up your project: SBT, Maven?
Maybe its dependency management is referring to an old version of flink? Maybe 
different versions of scala are mixed?

In that case, you may try setting up a new project:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/
scala_api_quickstart.html

When do you get the error? During compilation in eclipse? After submitting the 
job to flink?

Nico

On Wednesday, 12 April 2017 01:15:37 CEST Kürşat Kurt wrote:
> I have downloaded latest binary
> (http://www.apache.org/dyn/closer.lua/flink/flink-1.2.0/flink-1.2.0-bi
> n-had oop27-scala_2.11.tgz). I am getting this error in eclipse 
> Neon(3)
> 
> Regards,
> Kursat
> 
> -Original Message-
> From: Nico Kruber [mailto:n...@data-artisans.com]
> Sent: Tuesday, April 11, 2017 3:34 PM
> To: user@flink.apache.org
> Cc: Kürşat Kurt 
> Subject: Re: Aggregation problem.
> 
> maxBy() is still a member of org.apache.flink.api.scala.GroupedDataSet 
> in the current sources - what did you upgrade flink to?
> 
> Also please make sure the new version is used, or - if compiled from 
> sources
> - try a "mvn clean install" to get rid of old intermediate files.
> 
> 
> Regards
> Nico
> 
> On Sunday, 9 April 2017 00:38:23 CEST Kürşat Kurt wrote:
> > Hi;
> > 
> > 
> > 
> > I have just upgraded flink and cant use maxBy on grouped dataset.
> > 
> > I am getting the error below.
> > 
> > 
> > 
> > value maxBy is not a member of 
> > org.apache.flink.api.scala.GroupedDataSet
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > From: Kürşat Kurt [mailto:kur...@kursatkurt.com]
> > Sent: Sunday, February 19, 2017 1:28 AM
> > To: user@flink.apache.org
> > Subject: RE: Aggregation problem.
> > 
> > 
> > 
> > Yes, it works.
> > 
> > Thank you Yassine.
> > 
> > 
> > 
> > From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com]
> > Sent: Saturday, February 18, 2017 2:48 PM
> > To: user@flink.apache.org <mailto:user@flink.apache.org>
> > Subject: RE: Aggregation problem.
> > 
> > 
> > 
> > Hi,
> > 
> > 
> > 
> > I think this is an expected output and not necessarily a bug. To get 
> > the element having the maximum value, maxBy() should be used instead of 
> > max().
> > 
> > 
> > 
> > See this answer for more details :
> > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
> > Wrong-> > a nd-non-consistent-behavior-of-max-tp484p488.html
> > 
> > 
> > 
> > Best,
> > 
> > Yassine
> > 
> > 
> > 
> > On Feb 18, 2017 12:28, "Kürşat Kurt"  > <mailto:kur...@kursatkurt.com> > wrote:
> > 
> > Ok, i have opened the issue with the test case.
> > 
> > Thanks.
> > 
> > 
> > 
> > https://issues.apache.org/jira/browse/FLINK-5840
> > 
> > 
> > 
> > 
> > 
> > From: Fabian Hueske [mailto:fhue...@gmail.com]
> > Sent: Saturday, February 18, 2017 3:33 AM
> > To: user@flink.apache.org <mailto:user@flink.apache.org>
> > Subject: Re: Aggregation problem.
> > 
> > 
> > 
> > Hi,
> > 
> > this looks like a bug to me.
> > 
> > Can you open a JIRA and maybe a small testcase to reproduce the issue?
> > 
> > Thank you,
> > 
> > Fabian
> > 
> > 
> > 
> > 2017-02-18 1:06 GMT+01:00 Kürşat Kurt  > <mailto:kur...@kursatkurt.com> >:
> > 
> > Hi;
> > 
> > 
> > 
> > I have a Dataset like this:
> > 
> > 
> > 
> > (0,Auto,0.4,1,5.8317538999854194E-5)
> > 
> > (0,Computer,0.2,1,4.8828125E-5)
> > 
> > (0,Sports,0.4,2,1.7495261699956258E-4)
> > 
> > (1,Auto,0.4,1,1.7495261699956258E-4)
> > 
> > (1,Computer,0.2,1,4.8828125E-5)
> > 
> > (1,Sports,0.4,1,5.8317538999854194E-5)
> > 
> > 
> > 
> > This code; ds.groupBy(0).max(4).print() prints :
> > 
> > 
> > 
> > (0,Sports,0.4,1,1.7495261699956258E-4)
> > 
> > (1,Sports,0.4,1,1.7495261699956258E-4)
> > 
> > 
> > 
> > ..but i am expecting
> > 
> > 
> > 
> > (0,Sports,0.4,2,1.7495261699956258E-4)
> > 
> > (1,Auto,0.4,1,1.7495261699956258E-4)
> > 
> > 
> > 
> > What is wrong with this code?




Index conversion

2017-04-17 Thread Kürşat Kurt
Hi;

 

I have label index DataSet and final DataSet that i want to convert its
indexes to labels.

 

ixDS: DataSet[(Long, String)]

(1,Car)

(2,Sports)

(3,Home)

...

 

finalDS:DataSet[(Long, String, Double, String, Double)]

(1,x,1,y,4)

(2,z,3,t,5)

...

 

If i want to convert finalDS's indexes with join like this:

 

val res=finalDS.join(ixDS).where(0).equalTo(0){(l,r)=> {

(r._2,l._2)

}} 

res.print();

 

I can not get labed indexes. 

If i collect and try to match each DS i could get labels :

 

finalDS.collect().foreach(x=>{

ixDS.collect().foreach(y=> {

  if (x._1==y._1) System.err.println(""+x._2+","+y._2)

})

  })

 

.. but i think it is the wrong way.

What is the correct way for conversion?



RE: Aggregation problem.

2017-04-11 Thread Kürşat Kurt
I have downloaded latest binary 
(http://www.apache.org/dyn/closer.lua/flink/flink-1.2.0/flink-1.2.0-bin-hadoop27-scala_2.11.tgz).
I am getting this error in eclipse Neon(3)

Regards,
Kursat

-Original Message-
From: Nico Kruber [mailto:n...@data-artisans.com] 
Sent: Tuesday, April 11, 2017 3:34 PM
To: user@flink.apache.org
Cc: Kürşat Kurt 
Subject: Re: Aggregation problem.

maxBy() is still a member of org.apache.flink.api.scala.GroupedDataSet in the 
current sources - what did you upgrade flink to?

Also please make sure the new version is used, or - if compiled from sources - 
try a "mvn clean install" to get rid of old intermediate files.


Regards
Nico

On Sunday, 9 April 2017 00:38:23 CEST Kürşat Kurt wrote:
> Hi;
> 
> 
> 
> I have just upgraded flink and cant use maxBy on grouped dataset.
> 
> I am getting the error below.
> 
> 
> 
> value maxBy is not a member of org.apache.flink.api.scala.GroupedDataSet
> 
> 
> 
> 
> 
> 
> 
> From: Kürşat Kurt [mailto:kur...@kursatkurt.com]
> Sent: Sunday, February 19, 2017 1:28 AM
> To: user@flink.apache.org
> Subject: RE: Aggregation problem.
> 
> 
> 
> Yes, it works.
> 
> Thank you Yassine.
> 
> 
> 
> From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com]
> Sent: Saturday, February 18, 2017 2:48 PM
> To: user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: RE: Aggregation problem.
> 
> 
> 
> Hi,
> 
> 
> 
> I think this is an expected output and not necessarily a bug. To get the
> element having the maximum value, maxBy() should be used instead of max().
> 
> 
> 
> See this answer for more details :
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-a
> nd-non-consistent-behavior-of-max-tp484p488.html
> 
> 
> 
> Best,
> 
> Yassine
> 
> 
> 
> On Feb 18, 2017 12:28, "Kürşat Kurt"  <mailto:kur...@kursatkurt.com> > wrote:
> 
> Ok, i have opened the issue with the test case.
> 
> Thanks.
> 
> 
> 
> https://issues.apache.org/jira/browse/FLINK-5840
> 
> 
> 
> 
> 
> From: Fabian Hueske [mailto:fhue...@gmail.com]
> Sent: Saturday, February 18, 2017 3:33 AM
> To: user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: Aggregation problem.
> 
> 
> 
> Hi,
> 
> this looks like a bug to me.
> 
> Can you open a JIRA and maybe a small testcase to reproduce the issue?
> 
> Thank you,
> 
> Fabian
> 
> 
> 
> 2017-02-18 1:06 GMT+01:00 Kürşat Kurt  <mailto:kur...@kursatkurt.com> >:
> 
> Hi;
> 
> 
> 
> I have a Dataset like this:
> 
> 
> 
> (0,Auto,0.4,1,5.8317538999854194E-5)
> 
> (0,Computer,0.2,1,4.8828125E-5)
> 
> (0,Sports,0.4,2,1.7495261699956258E-4)
> 
> (1,Auto,0.4,1,1.7495261699956258E-4)
> 
> (1,Computer,0.2,1,4.8828125E-5)
> 
> (1,Sports,0.4,1,5.8317538999854194E-5)
> 
> 
> 
> This code; ds.groupBy(0).max(4).print() prints :
> 
> 
> 
> (0,Sports,0.4,1,1.7495261699956258E-4)
> 
> (1,Sports,0.4,1,1.7495261699956258E-4)
> 
> 
> 
> ..but i am expecting
> 
> 
> 
> (0,Sports,0.4,2,1.7495261699956258E-4)
> 
> (1,Auto,0.4,1,1.7495261699956258E-4)
> 
> 
> 
> What is wrong with this code?




RE: Aggregation problem.

2017-04-08 Thread Kürşat Kurt
Hi;

 

I have just upgraded flink and cant use maxBy on grouped dataset.

I am getting the error below.

 

value maxBy is not a member of org.apache.flink.api.scala.GroupedDataSet

 

 

 

From: Kürşat Kurt [mailto:kur...@kursatkurt.com] 
Sent: Sunday, February 19, 2017 1:28 AM
To: user@flink.apache.org
Subject: RE: Aggregation problem.

 

Yes, it works.

Thank you Yassine.

 

From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com] 
Sent: Saturday, February 18, 2017 2:48 PM
To: user@flink.apache.org <mailto:user@flink.apache.org> 
Subject: RE: Aggregation problem.

 

Hi,

 

I think this is an expected output and not necessarily a bug. To get the 
element having the maximum value, maxBy() should be used instead of max().

 

See this answer for more details : 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p488.html

 

Best,

Yassine

 

On Feb 18, 2017 12:28, "Kürşat Kurt" mailto:kur...@kursatkurt.com> > wrote:

Ok, i have opened the issue with the test case. 

Thanks.

 

https://issues.apache.org/jira/browse/FLINK-5840

 

 

From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Saturday, February 18, 2017 3:33 AM
To: user@flink.apache.org <mailto:user@flink.apache.org> 
Subject: Re: Aggregation problem.

 

Hi, 

this looks like a bug to me.

Can you open a JIRA and maybe a small testcase to reproduce the issue?

Thank you,

Fabian

 

2017-02-18 1:06 GMT+01:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >:

Hi;

 

I have a Dataset like this:

 

(0,Auto,0.4,1,5.8317538999854194E-5)

(0,Computer,0.2,1,4.8828125E-5)

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

(1,Computer,0.2,1,4.8828125E-5)

(1,Sports,0.4,1,5.8317538999854194E-5)

 

This code; ds.groupBy(0).max(4).print() prints :

 

(0,Sports,0.4,1,1.7495261699956258E-4)

(1,Sports,0.4,1,1.7495261699956258E-4)

 

..but i am expecting

 

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

 

What is wrong with this code?

 

 



RE: Aggregation problem.

2017-02-18 Thread Kürşat Kurt
Yes, it works.

Thank you Yassine.

 

From: Yassine MARZOUGUI [mailto:y.marzou...@mindlytix.com] 
Sent: Saturday, February 18, 2017 2:48 PM
To: user@flink.apache.org
Subject: RE: Aggregation problem.

 

Hi,

 

I think this is an expected output and not necessarily a bug. To get the 
element having the maximum value, maxBy() should be used instead of max().

 

See this answer for more details : 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p488.html

 

Best,

Yassine

 

On Feb 18, 2017 12:28, "Kürşat Kurt" mailto:kur...@kursatkurt.com> > wrote:

Ok, i have opened the issue with the test case. 

Thanks.

 

https://issues.apache.org/jira/browse/FLINK-5840

 

 

From: Fabian Hueske [mailto:fhue...@gmail.com <mailto:fhue...@gmail.com> ] 
Sent: Saturday, February 18, 2017 3:33 AM
To: user@flink.apache.org <mailto:user@flink.apache.org> 
Subject: Re: Aggregation problem.

 

Hi, 

this looks like a bug to me.

Can you open a JIRA and maybe a small testcase to reproduce the issue?

Thank you,

Fabian

 

2017-02-18 1:06 GMT+01:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >:

Hi;

 

I have a Dataset like this:

 

(0,Auto,0.4,1,5.8317538999854194E-5)

(0,Computer,0.2,1,4.8828125E-5)

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

(1,Computer,0.2,1,4.8828125E-5)

(1,Sports,0.4,1,5.8317538999854194E-5)

 

This code; ds.groupBy(0).max(4).print() prints :

 

(0,Sports,0.4,1,1.7495261699956258E-4)

(1,Sports,0.4,1,1.7495261699956258E-4)

 

..but i am expecting

 

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

 

What is wrong with this code?

 

 



RE: Aggregation problem.

2017-02-18 Thread Kürşat Kurt
Ok, i have opened the issue with the test case. 

Thanks.

 

https://issues.apache.org/jira/browse/FLINK-5840

 

 

From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Saturday, February 18, 2017 3:33 AM
To: user@flink.apache.org
Subject: Re: Aggregation problem.

 

Hi, 

this looks like a bug to me.

Can you open a JIRA and maybe a small testcase to reproduce the issue?

Thank you,

Fabian

 

2017-02-18 1:06 GMT+01:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >:

Hi;

 

I have a Dataset like this:

 

(0,Auto,0.4,1,5.8317538999854194E-5)

(0,Computer,0.2,1,4.8828125E-5)

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

(1,Computer,0.2,1,4.8828125E-5)

(1,Sports,0.4,1,5.8317538999854194E-5)

 

This code; ds.groupBy(0).max(4).print() prints :

 

(0,Sports,0.4,1,1.7495261699956258E-4)

(1,Sports,0.4,1,1.7495261699956258E-4)

 

..but i am expecting

 

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

 

What is wrong with this code?

 



Aggregation problem.

2017-02-17 Thread Kürşat Kurt
Hi;

 

I have a Dataset like this:

 

(0,Auto,0.4,1,5.8317538999854194E-5)

(0,Computer,0.2,1,4.8828125E-5)

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

(1,Computer,0.2,1,4.8828125E-5)

(1,Sports,0.4,1,5.8317538999854194E-5)

 

This code; ds.groupBy(0).max(4).print() prints :

 

(0,Sports,0.4,1,1.7495261699956258E-4)

(1,Sports,0.4,1,1.7495261699956258E-4)

 

..but i am expecting

 

(0,Sports,0.4,2,1.7495261699956258E-4)

(1,Auto,0.4,1,1.7495261699956258E-4)

 

What is wrong with this code?



Multiclass classification example

2016-10-18 Thread Kürşat Kurt
Hi;


I am trying to learn Flink Ml lib.

Where can i find detailed multiclass classification example?



SVM Multiclass classification

2016-10-13 Thread Kürşat Kurt
Hi;

 

I am trying to classify documents. 

When i try to predict (same of training set) there is only 1 and -1
predictions.

Accuracy is 0%.

 

 

Can you help me please?

 

val env = ExecutionEnvironment.getExecutionEnvironment

val training = Seq(

  new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0,
1.0, 1.0))),

  new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9),
Array(1.0, 1.0, 1.0, 1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0,
1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))),

  new LabeledVector(2.0, new SparseVector(10, Array(0, 2), Array(0.0,
1.0))),

  new LabeledVector(2.0, new SparseVector(10, Array(0), Array(0.0))),

  new LabeledVector(1.0, new SparseVector(10, Array(0, 3), Array(1.0,
1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0, 2, 3), Array(0.0,
1.0, 1.0))),

  new LabeledVector(2.0, new SparseVector(10, Array(0, 7, 9), Array(0.0,
1.0))),

  new LabeledVector(2.0, new SparseVector(10, Array(2,3,4),
Array(0.0,1.0,1.0))),

  new LabeledVector(2.0, new SparseVector(10, Array(0, 3), Array(1.0,
1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(2, 3,9), Array(1.0,
0.0, 1.0)))



);

val trainingDS = env.fromCollection(training)

val testingDS = env.fromCollection(training)

val svm = new SVM().setBlocks(env.getParallelism)

svm.fit(trainingDS)

val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label)))

predictions.print();

 

Sample output:

 

(1.0,1.0)

(1.0,1.0)

(0.0,1.0)

(0.0,-1.0)

(2.0,1.0)

(2.0,-1.0)

(1.0,1.0)

(0.0,1.0)

(2.0,1.0)

(2.0,1.0)

(2.0,1.0)

(0.0,1.0)



RE: Wordindex conversation.

2016-10-10 Thread Kürşat Kurt
Ok, thanks Fabian.

 

From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Tuesday, October 11, 2016 1:12 AM
To: user@flink.apache.org
Subject: Re: Wordindex conversation.

 

Hi,

you can do it like this:

 

1) you have to split each label record of the main dataset into separate 
records:


(0,List(a, b, c, d, e, f, g)) -> (0, a), (0, b), (0, c), ..., (0, g)
(1,List(b, c, f, a, g)) -> (1, b), (1, c), ..., (1, g)

2) join word index dataset with splitted main dataset:

DataSet> splittedMain = ...

DataSet> wordIdx = ...

DataSet joined = 
splittedMain.join(wordIdx).where(1).equalTo(1).with(...)

3) Group by Label:

DataSet> labelsWithIdx = 
joined.groupBy(0).reduceGroup(...) // collect all indexes in list / array

Best, Fabian

 

 

2016-10-10 23:49 GMT+02:00 Kürşat Kurt mailto:kur...@kursatkurt.com> >:

Hi;

 

I have MainDataset (Label,WordList) :

 

(0,List(a, b, c, d, e, f, g))

(1,List(b, c, f, a, g))

 

..and, wordIndex dataset(created with .zipWithIndex) : 

 

wordIndex> (0,a)

wordIndex> (1,b)

wordIndex> (2,c)

wordIndex> (3,d)

wordIndex> (4,e)

wordIndex> (5,f)

wordIndex> (6,g)

 

How can i convert mainDataset to indexed wordList dataset like this:

(0,List(1,2,3,4,5,6))

(1,List(2,3,5,0,6)

 

 



Wordindex conversation.

2016-10-10 Thread Kürşat Kurt
Hi;

 

I have MainDataset (Label,WordList) :

 

(0,List(a, b, c, d, e, f, g))

(1,List(b, c, f, a, g))

 

..and, wordIndex dataset(created with .zipWithIndex) : 

 

wordIndex> (0,a)

wordIndex> (1,b)

wordIndex> (2,c)

wordIndex> (3,d)

wordIndex> (4,e)

wordIndex> (5,f)

wordIndex> (6,g)

 

How can i convert mainDataset to indexed wordList dataset like this:

(0,List(1,2,3,4,5,6))

(1,List(2,3,5,0,6)

 



SVM classification problem.

2016-09-30 Thread Kürşat Kurt
Hi;

 

I am trying to train and predict with the same set. I expect that accuracy
shuld be %100, am i wrong?

If i try to predict with the same set; it is failing, also it classifies
like "-1" which is not in the training set.

What is wrong with this code?

 

Code:

def main(args: Array[String]): Unit = {

val env = ExecutionEnvironment.getExecutionEnvironment

val training = Seq(

  new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0,
1.0, 1.0))),

  new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9),
Array(1.0, 1.0, 1.0, 1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0,
1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0,
1.0))),

  new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0

 

val trainingDS = env.fromCollection(training)

val testingDS = env.fromCollection(training)

val svm = new SVM().setBlocks(env.getParallelism)

svm.fit(trainingDS)

val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label)))

predictions.print()



  }

 

Output:

(1.0,1.0)

(1.0,1.0)

(0.0,1.0)

(0.0,-1.0)

(0.0,1.0)

(0.0,-1.0)