Re: 【求助】关于 Flink ML 迭代中使用keyBy算子报错

2024-06-03 Thread Xiqian YU
您好!
看起来这个问题与 FLINK-35066[1] 有关,该问题描述了在 IterationBody 内实现自定义的RichCoProcessFunction 或 
CoFlatMapFunction 算子时遇到的拆包问题,可以追溯到这个[2]邮件列表中的问题报告。看起来这个问题也同样影响您使用的 
RichCoMapFunction 算子。
该问题已被此 Pull Request[3] 解决,并已合入 master 主分支。按照文档[4]尝试在本地编译 2.4-SNAPSHOT 
快照版本并执行您的代码,看起来能够正常工作。
鉴于这是一个 Flink ML 2.3 版本中的已知问题,您可以尝试在本地编译自己的快照版本,或是等待 Flink ML 2.4 的发布并更新依赖版本。
祝好!

Regards,
yux

[1] https://issues.apache.org/jira/browse/FLINK-35066
[2] https://lists.apache.org/thread/bgkw1g2tdgnp1xy1clsqtcfs3h18pkd6
[3] https://github.com/apache/flink-ml/pull/260
[4] https://github.com/apache/flink-ml#building-the-project


De : w...@shanghaitech.edu.cn 
Date : vendredi, 31 mai 2024 à 17:34
À : user-zh@flink.apache.org 
Objet : 【求助】关于 Flink ML 迭代中使用keyBy算子报错

尊敬的Flink开发者您好,

我在使用Flink ML模块的迭代功能时遇到了一个问题,当我在迭代体内使用keyBy算子时,会出现以下错误:

Caused by: java.lang.ClassCastException: 
org.apache.flink.iteration.IterationRecord cannot be cast to java.lang.String
我已经查阅文档,但还是没有头绪,所以希望能得到您的帮助,非常感谢。

我已在下方附上了最小可复现代码、报错信息以及我的运行环境信息。



以下是最小复现代码

~~~java
package myflinkml;

import org.apache.flink.iteration.DataStreamList;
import org.apache.flink.iteration.IterationBody;
import org.apache.flink.iteration.IterationBodyResult;
import org.apache.flink.iteration.Iterations;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.RichCoMapFunction;

public class BugDemo {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);

DataStream textStream = env.fromElements("Hello", "Flink");
DataStream intStream = env.fromElements(1, 2, 3);

Iterations.iterateUnboundedStreams(
DataStreamList.of(intStream),
DataStreamList.of(textStream),
new Body()

).get(0).print();

env.execute();
}

private static class Body implements IterationBody {

@Override
public IterationBodyResult process(DataStreamList dsl1, DataStreamList 
dsl2) {
DataStream intStream = dsl1.get(0);
DataStream textStream = dsl2.get(0);

// 迭代输出流
DataStream outStream = textStream
.connect(intStream)
.keyBy(x -> 1, x -> 1)  // 添加这行就报错!!
.map(new RichCoMapFunction() {

@Override
public String map1(String value) throws Exception {
return "Strings: " + value;
}

@Override
public String map2(Integer value) throws Exception {
return "Integer: " + value;
}
});

// 迭代反馈流
SingleOutputStreamOperator feedBackStream = 
intStream.map(x -> x - 1).filter(x -> x > 0);

return new IterationBodyResult(DataStreamList.of(feedBackStream), 
DataStreamList.of(outStream));
}
}
}

~~~

运行报错输出:

Exception in thread "main" 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at 
org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1300)
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at 
java.util.concurrent.Com

How to define the termination criteria for iterations in Flink ML?

2024-03-28 Thread Komal M
Hi all,

I have another question regarding Flink ML’s iterations.

In the documentation it says “The iterative algorithm has an iteration body 
that is repeatedly invoked until some termination criteria is reached (e.g. 
after a user-specified number of epochs has been reached).”

My question is what exactly is the syntax (in Java) to define the number of 
epochs in the code? Or any other termination criteria inside the iteration body?

I’m using Flink v1.17.2, Flink ML v.2.3.0

Thanks in advance,
Komal







Re: Flink ML

2023-08-02 Thread yidan zhao
这个取决于你是什么模型,比如python中sklearn的大多模型都可以导出成pmml格式模型,然后java用jpmml库就可以导入进行预测。
如果是tensorflow模型,也有,只不过我忘记了,你可以找找。

15904502343 <15904502...@163.com> 于2023年8月1日周二 16:48写道:
>
> 您好
> 我想知道是否有代码示例,可以在Flink程序中加载预先训练好的编码模型(用python编写)


Flink ML

2023-08-01 Thread 15904502343
您好
我想知道是否有代码示例,可以在Flink程序中加载预先训练好的编码模型(用python编写)

[ANNOUNCE] Apache Flink ML 2.1.0 released

2022-07-12 Thread Zhipeng Zhang
The Apache Flink community is excited to announce the release of Flink ML
2.1.0!


This release focuses on improving Flink ML's infrastructure, such as Python
SDK, memory management, and benchmark framework, to facilitate the
development of performant, memory-safe, and easy-to-use algorithm
libraries. We validated the enhanced infrastructure via benchmarks and
confirmed that Flink ML can meet or exceed the performance of selected
algorithms from alternative popular ML libraries. In addition, this release
added example Python and Java programs to help users learn and use Flink ML.


Please check out the release blog post for an overview of the release:

https://flink.apache.org/news/2022/07/12/release-ml-2.1.0.html


The release is available for download at:

https://flink.apache.org/downloads.html



Maven artifacts for Flink ML can be found at:

https://search.maven.org/search?q=g:org.apache.flink%20ml



Python SDK for Flink ML published to the PyPI index can be found at:

https://pypi.org/project/apache-flink-ml/



The full release notes are available in Jira:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351141


We would like to thank all contributors of the Apache Flink community who
made this release possible!



Regards,

Yun and Zhipeng


[ANNOUNCE] Apache Flink ML 2.1.0 released

2022-07-12 Thread Zhipeng Zhang
The Apache Flink community is excited to announce the release of Flink ML
2.1.0!


This release focuses on improving Flink ML's infrastructure, such as Python
SDK, memory management, and benchmark framework, to facilitate the
development of performant, memory-safe, and easy-to-use algorithm
libraries. We validated the enhanced infrastructure via benchmarks and
confirmed that Flink ML can meet or exceed the performance of selected
algorithms from alternative popular ML libraries. In addition, this release
added example Python and Java programs to help users learn and use Flink ML.


Please check out the release blog post for an overview of the release:

https://flink.apache.org/news/2022/07/12/release-ml-2.1.0.html


The release is available for download at:

https://flink.apache.org/downloads.html



Maven artifacts for Flink ML can be found at:

https://search.maven.org/search?q=g:org.apache.flink%20ml



Python SDK for Flink ML published to the PyPI index can be found at:

https://pypi.org/project/apache-flink-ml/



The full release notes are available in Jira:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351141


We would like to thank all contributors of the Apache Flink community who
made this release possible!



Regards,

Yun and Zhipeng


flink-ml onlinekmeans

2022-06-09 Thread Natia Chachkhiani
Hi,

I have a couple of questions about the onlineKmeans algorithm. I am running
OnlikeKmeans on a small dataset (36k records) with k=2 and testing varying
decay rates. The features are consumed in flink from kafka topic.
Sample feature: [0., 5.0, 1.0, 0.0, 0.0, 0.0, 0.0]

Is the implementation of flink-ml similar to spark's streamingKmeans?
Should I expect similar results when running the same dataset through both?
I am getting very different results. spark is learning much better.

Does the rate of feature ingestion affect the results? Is it supposed to do
`fit` on every point?
Is `fit` slower than `transform` operation?
Thanks for your help!

Sample code i am running

final DenseVector[] trainData =
new DenseVector[] {
Vectors.dense(1.0, 7.0, 1.0, 0.0, 0.0, 0.0, 0.0),
Vectors.dense(1.0, 15.0, 1.0, 0.0, 0.0, 0.0, 0.0)
};

String featuresCol = "features";
String predictionCol = "prediction";

StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.enableCheckpointing(100);
env.setRestartStrategy(RestartStrategies.noRestart());
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);

KMeans kMeans = new
KMeans().setFeaturesCol(featuresCol).setPredictionCol(predictionCol);
Table offlineTrainTable =
tEnv.fromDataStream(env.fromElements(trainData)).as(featuresCol);
KMeansModel model = kMeans.fit(offlineTrainTable);

KafkaSource inputSource =
KafkaSource.builder()
.setBootstrapServers(bootstrapServer)
.setTopics("sample_topic")
.setGroupId("sample_group")
.setStartingOffsets(OffsetsInitializer.latest())
.setValueOnlyDeserializer(new JsonFeatureSchema())
.build();


DataStream messageStream =
env.fromSource(inputSource, WatermarkStrategy.noWatermarks(),
"Kafka Source");


Table input = tEnv.fromDataStream(messageStream).as(featuresCol);
OnlineKMeans onlineKMeans =
new OnlineKMeans()
.setGlobalBatchSize(1)
.setFeaturesCol(featuresCol)
.setPredictionCol(predictionCol)
.setDecayFactor(0.3)
.setInitialModelData(model.getModelData()[0]);

OnlineKMeansModel onlineModel = onlineKMeans.fit(input);
Table outputTable = onlineModel.transform(input)[0];


Re: flink-ml algorithms

2022-06-06 Thread Natia Chachkhiani
Hi, I have another question. Is the implementation of kmeans in flink-ml
same as Spark's StreamingKmeans?
Should the accuracy/results from the same dataset be comparable between the
two?

On Sun, Jun 5, 2022 at 8:14 PM Natia Chachkhiani <
natia.chachkhia...@gmail.com> wrote:

> Thanks for the reply Zhipeng and Jing.
> Running the OnlineKmeans with a fixed initial model removed the randomness!
>
>
> On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang 
> wrote:
>
>> Hi Natia,
>>
>> As I understand, the processing order of onlineKmeans is the same the
>> input data.
>>
>> Are you running OnlineKmeans with using one data point with random
>> initial KmeansModel? Could you use a fixed initial model following [1] and
>> try out?
>>
>> [1]
>> https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354
>>
>> Jing Ge  于2022年6月3日周五 17:04写道:
>>
>>> Hi,
>>>
>>> It seems like an evaluation with a small dataset. In this case, would
>>> you like to share your data sample and code? In addition, have you tried
>>> KMeans with the same dataset and got inconsistent results too?
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
>>> natia.chachkhia...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
>>>> noticed that I don't get consistent results, assignments to clusters,
>>>> across different runs. I have set both parallelism and globalBatchSize to 
>>>> 1.
>>>> I am doing simple fit and transform on each data point ingested. Is the
>>>> order of processing not guaranteed? Or am I missing something?
>>>>
>>>> Thanks,
>>>> Natia
>>>>
>>>
>>
>> --
>> best,
>> Zhipeng
>>
>>


Re: flink-ml algorithms

2022-06-05 Thread Natia Chachkhiani
Thanks for the reply Zhipeng and Jing.
Running the OnlineKmeans with a fixed initial model removed the randomness!


On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang 
wrote:

> Hi Natia,
>
> As I understand, the processing order of onlineKmeans is the same the
> input data.
>
> Are you running OnlineKmeans with using one data point with random initial
> KmeansModel? Could you use a fixed initial model following [1] and try out?
>
> [1]
> https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354
>
> Jing Ge  于2022年6月3日周五 17:04写道:
>
>> Hi,
>>
>> It seems like an evaluation with a small dataset. In this case, would you
>> like to share your data sample and code? In addition, have you tried KMeans
>> with the same dataset and got inconsistent results too?
>>
>> Best regards,
>> Jing
>>
>> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
>> natia.chachkhia...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
>>> noticed that I don't get consistent results, assignments to clusters,
>>> across different runs. I have set both parallelism and globalBatchSize to 1.
>>> I am doing simple fit and transform on each data point ingested. Is the
>>> order of processing not guaranteed? Or am I missing something?
>>>
>>> Thanks,
>>> Natia
>>>
>>
>
> --
> best,
> Zhipeng
>
>


Re: flink-ml algorithms

2022-06-05 Thread Zhipeng Zhang
Hi Natia,

As I understand, the processing order of onlineKmeans is the same the input
data.

Are you running OnlineKmeans with using one data point with random initial
KmeansModel? Could you use a fixed initial model following [1] and try out?

[1]
https://github.com/apache/flink-ml/blob/239788f2b1f1f3a4e55ca112517980b598705a15/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/OnlineKMeansTest.java#L354

Jing Ge  于2022年6月3日周五 17:04写道:

> Hi,
>
> It seems like an evaluation with a small dataset. In this case, would you
> like to share your data sample and code? In addition, have you tried KMeans
> with the same dataset and got inconsistent results too?
>
> Best regards,
> Jing
>
> On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
> natia.chachkhia...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
>> noticed that I don't get consistent results, assignments to clusters,
>> across different runs. I have set both parallelism and globalBatchSize to 1.
>> I am doing simple fit and transform on each data point ingested. Is the
>> order of processing not guaranteed? Or am I missing something?
>>
>> Thanks,
>> Natia
>>
>

-- 
best,
Zhipeng


Re: flink-ml algorithms

2022-06-03 Thread Jing Ge
Hi,

It seems like an evaluation with a small dataset. In this case, would you
like to share your data sample and code? In addition, have you tried KMeans
with the same dataset and got inconsistent results too?

Best regards,
Jing

On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani <
natia.chachkhia...@gmail.com> wrote:

> Hi,
>
> I am running OnlineKmeans from flink-ml repo on a small dataset. I've
> noticed that I don't get consistent results, assignments to clusters,
> across different runs. I have set both parallelism and globalBatchSize to 1.
> I am doing simple fit and transform on each data point ingested. Is the
> order of processing not guaranteed? Or am I missing something?
>
> Thanks,
> Natia
>


flink-ml algorithms

2022-06-02 Thread Natia Chachkhiani
Hi,

I am running OnlineKmeans from flink-ml repo on a small dataset. I've
noticed that I don't get consistent results, assignments to clusters,
across different runs. I have set both parallelism and globalBatchSize to 1.
I am doing simple fit and transform on each data point ingested. Is the
order of processing not guaranteed? Or am I missing something?

Thanks,
Natia


Re: Flink-ML: Sink model data in online training

2022-01-27 Thread Zhipeng Zhang
Hi thekingofcity,

Thanks for your interest! Unfortunately we don't have an example for online
learning for now.

We are working on an online machine learning example. Hopefully it will be
added here [1] in the next three weeks.


[1] https://github.com/apache/flink-ml

thekingofcity  于2022年1月26日周三 16:47写道:

> Hi,
>
> I want sink the model data (coefficient from the logsitic regression model
> in my case) from the flink.ml.api.Model to print or file. I figure out the
> way to sink it in the batch training mode but face the following exception
> when the Estimator takes an UNBOUNDED datastream.
>
> ```
> Caused by: java.lang.IllegalStateException: There can be only a single
> consumer in a FeedbackChannel.
> at
> org.apache.flink.statefun.flink.core.feedback.FeedbackChannel.registerConsumer(FeedbackChannel.java:79)
> ```
>
> This will happend if I dump it through the Table API like this:
>
> ```
> final TableDescriptor sinkDescriptor = TableDescriptor
> .forConnector("print")
> .schema(Schema
> .newBuilder()
> .column("coefficient", DataTypes.of(new
> DenseVectorTypeInfo()))
> .build()
> ).build();
> tEnv.createTemporaryTable("ModelSink", sinkDescriptor);
> model.getModelData()[0].executeInsert("ModelSink");
> ```
>
> Looking for an example that can sink the model data in online training
> mode.
>
> With many thanks,
> thekingofcity
>
>

-- 
best,
Zhipeng


Flink-ML: Sink model data in online training

2022-01-26 Thread thekingofcity
Hi,

I want sink the model data (coefficient from the logsitic regression model in 
my case) from the flink.ml.api.Model to print or file. I figure out the way to 
sink it in the batch training mode but face the following exception when the 
Estimator takes an UNBOUNDED datastream.

```

Caused by: java.lang.IllegalStateException: There can be only a single consumer 
in a FeedbackChannel.
at 
org.apache.flink.statefun.flink.core.feedback.FeedbackChannel.registerConsumer(FeedbackChannel.java:79)
```

This will happend if I dump it through the Table API like this:

```
final TableDescriptor sinkDescriptor = TableDescriptor
.forConnector("print")
.schema(Schema
.newBuilder()
.column("coefficient", DataTypes.of(new DenseVectorTypeInfo()))
.build()
).build();
tEnv.createTemporaryTable("ModelSink", sinkDescriptor);
model.getModelData()[0].executeInsert("ModelSink");
```

Looking for an example that can sink the model data in online training mode.

With many thanks,
thekingofcity

Re: Examples / Documentation for Flink ML 2

2022-01-21 Thread Dong Lin
Hey Bonino,

Sounds great. Since we have not set up the website for Flink ML yet, how
about we create PRs for https://github.com/apache/flink-ml and put those
Markdown files under flink-ml/docs?

Best Regards,
Dong

On Sat, Jan 22, 2022 at 12:25 AM Bonino Dario 
wrote:

> Hi Dong,
>
> We assembled a first, very small, Markdown document providing a jump-start
> description using a kMeans example. I could already share it with you to
> check if we are pointing in the right direction. I had a look at the Flink
> contribution guidelines, however the flink-ml project is  somewhat
> "separate" from Flink and the same I think holds for the documentation. How
> do you think it is better to proceed?
>
> Best regards
>
> Dario Bonino
> On 1/19/22 09:36, Dong Lin wrote:
>
> Hi Bonino,
>
> Definitely, it will be great to build up the Flink ML docs together based
> on your experience.
>
> Thanks!
> Dong
>
> On Wed, Jan 19, 2022 at 4:32 PM Bonino Dario 
> wrote:
>
>> Hi Dong,
>>
>> Thank you for the reply. Since we are actually experimenting with the
>> Flink ML libraries, If you think it's worth, we may contribute some
>> documentation, e.g., tutorial based on what we learn while setting up our
>> test project with Flink ML. Is it something that might be of interest for
>> you?
>>
>> Best regards
>>
>> Dario
>> On 1/18/22 04:51, Dong Lin wrote:
>>
>> Hi Bonino,
>>
>> Thanks for your interest!
>>
>> Flink ML is currently ready for experienced algorithm developers to try
>> it out because we have setup the basic APIs and infrastructure to develop
>> algorithms. Five algorithms (i.e. kmeans, naive bays, knn, logistic
>> regression and one-hot encoder) has been implemented in the last release.
>> Their unit tests can be found here
>> <https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/feature>,
>> here
>> <https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering>
>> and here
>> <https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/classification>,
>> which show how to use these algorithms (including transform/fit/save/load).
>> And from these unit tests you can find implementation of these algorithms
>> which can be used as reference implementation to develop other algorithms
>> of your interest.
>>
>> We plan to setup a website for Flink ML to provide links to
>> example/tutorial similar to the Flink Statefun website (link
>> <https://nightlies.apache.org/flink/flink-statefun-docs-stable/>). This
>> website will likely be setup in March. We are currently working on
>> developing further infrastructure for benchmarking and optimizing the
>> machine learning algorithms in Flink ML.
>>
>> Best Regards,
>> Dong
>>
>>
>>
>> On Mon, Jan 17, 2022 at 8:57 PM Dawid Wysakowicz 
>> wrote:
>>
>>> I am adding a couple of people who worked on it. Hopefully, they will be
>>> able to answer you.
>>> On 17/01/2022 13:39, Bonino Dario wrote:
>>>
>>> Dear List,
>>>
>>> We are in the process of evaluating Flink ML version 2.0 in the context
>>> of some ML task mainly concerned with classification and clustering.
>>>
>>> While algorithms for this 2 domains are already present, although in a
>>> limited form (perhaps) in the latest release of Flink ML, we did not found
>>> any example / documentation that could guide our experiments.
>>>
>>> Is some adoption example available, like code, tutorial or any
>>> information that might help us in bootstrapping a Flink ML 2 project?
>>>
>>> Thank you very much
>>>
>>> Best regards
>>>
>>> --
>>> Ing. Dario Bonino, Ph.D
>>>
>>> e-m@il: dario.bon...@gmail.com
>>> www: https://www.linkedin.com/in/dariobonino
>>> 
>>> Dario
>>> Bonino
>>> slide...@hotmail.com
>>> 
>>>
>>> --
>> Ing. Dario Bonino, Ph.D
>>
>> e-m@il: dario.bon...@gmail.com
>> www: https://www.linkedin.com/in/dariobonino
>> 
>>  Dario
>>  Bonino
>>  slide...@hotmail.com
>> 
>>
>> --
> Ing. Dario Bonino, Ph.D
>
> e-m@il: dario.bon...@gmail.com
> www: https://www.linkedin.com/in/dariobonino
> 
>   Dario
>   Bonino
>   slide...@hotmail.com
> 
>
>


Re: Examples / Documentation for Flink ML 2

2022-01-21 Thread Bonino Dario

Hi Dong,

We assembled a first, very small, Markdown document providing a 
jump-start description using a kMeans example. I could already share it 
with you to check if we are pointing in the right direction. I had a 
look at the Flink contribution guidelines, however the flink-ml project 
is  somewhat "separate" from Flink and the same I think holds for the 
documentation. How do you think it is better to proceed?


Best regards

Dario Bonino

On 1/19/22 09:36, Dong Lin wrote:

Hi Bonino,

Definitely, it will be great to build up the Flink ML docs together 
based on your experience.


Thanks!
Dong

On Wed, Jan 19, 2022 at 4:32 PM Bonino Dario  
wrote:


Hi Dong,

Thank you for the reply. Since we are actually experimenting with
    the Flink ML libraries, If you think it's worth, we may contribute
some documentation, e.g., tutorial based on what we learn while
setting up our test project with Flink ML. Is it something that
might be of interest for you?

Best regards

Dario

On 1/18/22 04:51, Dong Lin wrote:

Hi Bonino,

Thanks for your interest!

    Flink ML is currently ready for experienced algorithm developers
to try it out because we have setup the basic APIs and
infrastructure to develop algorithms. Five algorithms (i.e.
kmeans, naive bays, knn, logistic regression and one-hot encoder)
has been implemented in the last release. Their unit tests can be
found here

<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/feature>,
here

<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering>
and here

<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/classification>,
which show how to use these algorithms (including
transform/fit/save/load). And from these unit tests you can find
implementation of these algorithms which can be used as reference
implementation to develop other algorithms of your interest.

We plan to setup a website for Flink ML to provide links to
example/tutorial similar to the Flink Statefun website (link
<https://nightlies.apache.org/flink/flink-statefun-docs-stable/>).
This website will likely be setup in March. We are currently
working on developing further infrastructure for benchmarking and
optimizing the machine learning algorithms in Flink ML.

Best Regards,
Dong



On Mon, Jan 17, 2022 at 8:57 PM Dawid Wysakowicz
 wrote:

I am adding a couple of people who worked on it. Hopefully,
they will be able to answer you.

On 17/01/2022 13:39, Bonino Dario wrote:


Dear List,

We are in the process of evaluating Flink ML version 2.0 in
the context of some ML task mainly concerned with
classification and clustering.

While algorithms for this 2 domains are already present,
although in a limited form (perhaps) in the latest release
of Flink ML, we did not found any example / documentation
that could guide our experiments.

Is some adoption example available, like code, tutorial or
any information that might help us in bootstrapping a Flink
ML 2 project?

Thank you very much

Best regards

-- 
Ing. Dario Bonino, Ph.D


e-m@il:dario.bon...@gmail.com  
www:https://www.linkedin.com/in/dariobonino


Dario
Bonino
slide...@hotmail.com



-- 
Ing. Dario Bonino, Ph.D


e-m@il:dario.bon...@gmail.com  
www:https://www.linkedin.com/in/dariobonino


Dario
Bonino
slide...@hotmail.com



--
Ing. Dario Bonino, Ph.D

e-m@il:dario.bon...@gmail.com  
www:https://www.linkedin.com/in/dariobonino


Dario
Bonino
slide...@hotmail.com



Re: Examples / Documentation for Flink ML 2

2022-01-19 Thread Dong Lin
Hi Bonino,

Definitely, it will be great to build up the Flink ML docs together based
on your experience.

Thanks!
Dong

On Wed, Jan 19, 2022 at 4:32 PM Bonino Dario  wrote:

> Hi Dong,
>
> Thank you for the reply. Since we are actually experimenting with the
> Flink ML libraries, If you think it's worth, we may contribute some
> documentation, e.g., tutorial based on what we learn while setting up our
> test project with Flink ML. Is it something that might be of interest for
> you?
>
> Best regards
>
> Dario
> On 1/18/22 04:51, Dong Lin wrote:
>
> Hi Bonino,
>
> Thanks for your interest!
>
> Flink ML is currently ready for experienced algorithm developers to try it
> out because we have setup the basic APIs and infrastructure to develop
> algorithms. Five algorithms (i.e. kmeans, naive bays, knn, logistic
> regression and one-hot encoder) has been implemented in the last release.
> Their unit tests can be found here
> <https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/feature>,
> here
> <https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering>
> and here
> <https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/classification>,
> which show how to use these algorithms (including transform/fit/save/load).
> And from these unit tests you can find implementation of these algorithms
> which can be used as reference implementation to develop other algorithms
> of your interest.
>
> We plan to setup a website for Flink ML to provide links to
> example/tutorial similar to the Flink Statefun website (link
> <https://nightlies.apache.org/flink/flink-statefun-docs-stable/>). This
> website will likely be setup in March. We are currently working on
> developing further infrastructure for benchmarking and optimizing the
> machine learning algorithms in Flink ML.
>
> Best Regards,
> Dong
>
>
>
> On Mon, Jan 17, 2022 at 8:57 PM Dawid Wysakowicz 
> wrote:
>
>> I am adding a couple of people who worked on it. Hopefully, they will be
>> able to answer you.
>> On 17/01/2022 13:39, Bonino Dario wrote:
>>
>> Dear List,
>>
>> We are in the process of evaluating Flink ML version 2.0 in the context
>> of some ML task mainly concerned with classification and clustering.
>>
>> While algorithms for this 2 domains are already present, although in a
>> limited form (perhaps) in the latest release of Flink ML, we did not found
>> any example / documentation that could guide our experiments.
>>
>> Is some adoption example available, like code, tutorial or any
>> information that might help us in bootstrapping a Flink ML 2 project?
>>
>> Thank you very much
>>
>> Best regards
>>
>> --
>> Ing. Dario Bonino, Ph.D
>>
>> e-m@il: dario.bon...@gmail.com
>> www: https://www.linkedin.com/in/dariobonino
>> 
>>  Dario
>>  Bonino
>>  slide...@hotmail.com
>> 
>>
>> --
> Ing. Dario Bonino, Ph.D
>
> e-m@il: dario.bon...@gmail.com
> www: https://www.linkedin.com/in/dariobonino
> 
>   Dario
>   Bonino
>   slide...@hotmail.com
> 
>
>


Re: Examples / Documentation for Flink ML 2

2022-01-19 Thread Bonino Dario

Hi Dong,

Thank you for the reply. Since we are actually experimenting with the 
Flink ML libraries, If you think it's worth, we may contribute some 
documentation, e.g., tutorial based on what we learn while setting up 
our test project with Flink ML. Is it something that might be of 
interest for you?


Best regards

Dario

On 1/18/22 04:51, Dong Lin wrote:

Hi Bonino,

Thanks for your interest!

Flink ML is currently ready for experienced algorithm developers to 
try it out because we have setup the basic APIs and infrastructure to 
develop algorithms. Five algorithms (i.e. kmeans, naive bays, knn, 
logistic regression and one-hot encoder) has been implemented in the 
last release. Their unit tests can be found here 
<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/feature>, 
here 
<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering> 
and here 
<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/classification>, 
which show how to use these algorithms (including 
transform/fit/save/load). And from these unit tests you can find 
implementation of these algorithms which can be used as reference 
implementation to develop other algorithms of your interest.


We plan to setup a website for Flink ML to provide links to 
example/tutorial similar to the Flink Statefun website (link 
<https://nightlies.apache.org/flink/flink-statefun-docs-stable/>). 
This website will likely be setup in March. We are currently working 
on developing further infrastructure for benchmarking and optimizing 
the machine learning algorithms in Flink ML.


Best Regards,
Dong



On Mon, Jan 17, 2022 at 8:57 PM Dawid Wysakowicz 
 wrote:


I am adding a couple of people who worked on it. Hopefully, they
will be able to answer you.

On 17/01/2022 13:39, Bonino Dario wrote:


Dear List,

We are in the process of evaluating Flink ML version 2.0 in the
context of some ML task mainly concerned with classification and
clustering.

While algorithms for this 2 domains are already present, although
in a limited form (perhaps) in the latest release of Flink ML, we
did not found any example / documentation that could guide our
experiments.

Is some adoption example available, like code, tutorial or any
information that might help us in bootstrapping a Flink ML 2
project?

Thank you very much

Best regards

-- 
Ing. Dario Bonino, Ph.D


e-m@il:dario.bon...@gmail.com  
www:https://www.linkedin.com/in/dariobonino


Dario
Bonino
slide...@hotmail.com




--
Ing. Dario Bonino, Ph.D

e-m@il:dario.bon...@gmail.com  
www:https://www.linkedin.com/in/dariobonino


Dario
Bonino
slide...@hotmail.com



Re: Examples / Documentation for Flink ML 2

2022-01-17 Thread Dong Lin
Hi Bonino,

Thanks for your interest!

Flink ML is currently ready for experienced algorithm developers to try it
out because we have setup the basic APIs and infrastructure to develop
algorithms. Five algorithms (i.e. kmeans, naive bays, knn, logistic
regression and one-hot encoder) has been implemented in the last release.
Their unit tests can be found here
<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/feature>,
here
<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/clustering>
and here
<https://github.com/apache/flink-ml/tree/master/flink-ml-lib/src/test/java/org/apache/flink/ml/classification>,
which show how to use these algorithms (including transform/fit/save/load).
And from these unit tests you can find implementation of these algorithms
which can be used as reference implementation to develop other algorithms
of your interest.

We plan to setup a website for Flink ML to provide links to
example/tutorial similar to the Flink Statefun website (link
<https://nightlies.apache.org/flink/flink-statefun-docs-stable/>). This
website will likely be setup in March. We are currently working on
developing further infrastructure for benchmarking and optimizing the
machine learning algorithms in Flink ML.

Best Regards,
Dong



On Mon, Jan 17, 2022 at 8:57 PM Dawid Wysakowicz 
wrote:

> I am adding a couple of people who worked on it. Hopefully, they will be
> able to answer you.
> On 17/01/2022 13:39, Bonino Dario wrote:
>
> Dear List,
>
> We are in the process of evaluating Flink ML version 2.0 in the context of
> some ML task mainly concerned with classification and clustering.
>
> While algorithms for this 2 domains are already present, although in a
> limited form (perhaps) in the latest release of Flink ML, we did not found
> any example / documentation that could guide our experiments.
>
> Is some adoption example available, like code, tutorial or any information
> that might help us in bootstrapping a Flink ML 2 project?
>
> Thank you very much
>
> Best regards
>
> --
> Ing. Dario Bonino, Ph.D
>
> e-m@il: dario.bon...@gmail.com
> www: https://www.linkedin.com/in/dariobonino
> 
>   Dario
>   Bonino
>   slide...@hotmail.com
> 
>
>


Re: Examples / Documentation for Flink ML 2

2022-01-17 Thread Dawid Wysakowicz
I am adding a couple of people who worked on it. Hopefully, they will be
able to answer you.

On 17/01/2022 13:39, Bonino Dario wrote:
>
> Dear List,
>
> We are in the process of evaluating Flink ML version 2.0 in the
> context of some ML task mainly concerned with classification and
> clustering.
>
> While algorithms for this 2 domains are already present, although in a
> limited form (perhaps) in the latest release of Flink ML, we did not
> found any example / documentation that could guide our experiments.
>
> Is some adoption example available, like code, tutorial or any
> information that might help us in bootstrapping a Flink ML 2 project?
>
> Thank you very much
>
> Best regards
>
> -- 
> Ing. Dario Bonino, Ph.D
>
> e-m@il: dario.bon...@gmail.com 
> www: https://www.linkedin.com/in/dariobonino
> 
>   Dario
>   Bonino
>   slide...@hotmail.com
>  


OpenPGP_signature
Description: OpenPGP digital signature


Examples / Documentation for Flink ML 2

2022-01-17 Thread Bonino Dario

Dear List,

We are in the process of evaluating Flink ML version 2.0 in the context 
of some ML task mainly concerned with classification and clustering.


While algorithms for this 2 domains are already present, although in a 
limited form (perhaps) in the latest release of Flink ML, we did not 
found any example / documentation that could guide our experiments.


Is some adoption example available, like code, tutorial or any 
information that might help us in bootstrapping a Flink ML 2 project?


Thank you very much

Best regards

--
Ing. Dario Bonino, Ph.D

e-m@il:dario.bon...@gmail.com  
www:https://www.linkedin.com/in/dariobonino


Dario
Bonino
slide...@hotmail.com



Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-10 Thread Till Rohrmann
This is really great news. Thanks a lot for all the work Dong, Yun, Zhipeng
and others!

Cheers,
Till

On Fri, Jan 7, 2022 at 2:36 PM David Morávek  wrote:

> Great job! <3 Thanks Dong and Yun for managing the release and big thanks
> to everyone who has contributed!
>
> Best,
> D.
>
> On Fri, Jan 7, 2022 at 2:27 PM Yun Gao  wrote:
>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink ML 2.0.0.
>>
>>
>>
>> Apache Flink ML provides API and infrastructure that simplifies
>> implementing distributed ML algorithms,
>>
>> and it also provides a library of off-the-shelf ML algorithms.
>>
>>
>>
>> Please check out the release blog post for an overview of the release:
>>
>> https://flink.apache.org/news/2022/01/07/release-ml-2.0.0.html
>>
>>
>>
>> The release is available for download at:
>>
>> https://flink.apache.org/downloads.html
>>
>>
>>
>> Maven artifacts for Flink ML can be found at:
>>
>> https://search.maven.org/search?q=g:org.apache.flink%20ml
>>
>>
>>
>> Python SDK for Flink ML published to the PyPI index can be found at:
>>
>> https://pypi.org/project/apache-flink-ml/
>>
>>
>>
>> The full release notes are available in Jira:
>>
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351079
>>
>>
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>>
>>
>> Regards,
>>
>> Dong and Yun
>>
>


Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-10 Thread Till Rohrmann
This is really great news. Thanks a lot for all the work Dong, Yun, Zhipeng
and others!

Cheers,
Till

On Fri, Jan 7, 2022 at 2:36 PM David Morávek  wrote:

> Great job! <3 Thanks Dong and Yun for managing the release and big thanks
> to everyone who has contributed!
>
> Best,
> D.
>
> On Fri, Jan 7, 2022 at 2:27 PM Yun Gao  wrote:
>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink ML 2.0.0.
>>
>>
>>
>> Apache Flink ML provides API and infrastructure that simplifies
>> implementing distributed ML algorithms,
>>
>> and it also provides a library of off-the-shelf ML algorithms.
>>
>>
>>
>> Please check out the release blog post for an overview of the release:
>>
>> https://flink.apache.org/news/2022/01/07/release-ml-2.0.0.html
>>
>>
>>
>> The release is available for download at:
>>
>> https://flink.apache.org/downloads.html
>>
>>
>>
>> Maven artifacts for Flink ML can be found at:
>>
>> https://search.maven.org/search?q=g:org.apache.flink%20ml
>>
>>
>>
>> Python SDK for Flink ML published to the PyPI index can be found at:
>>
>> https://pypi.org/project/apache-flink-ml/
>>
>>
>>
>> The full release notes are available in Jira:
>>
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351079
>>
>>
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>>
>>
>> Regards,
>>
>> Dong and Yun
>>
>


Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread David Morávek
Great job! <3 Thanks Dong and Yun for managing the release and big thanks
to everyone who has contributed!

Best,
D.

On Fri, Jan 7, 2022 at 2:27 PM Yun Gao  wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink ML 2.0.0.
>
>
>
> Apache Flink ML provides API and infrastructure that simplifies
> implementing distributed ML algorithms,
>
> and it also provides a library of off-the-shelf ML algorithms.
>
>
>
> Please check out the release blog post for an overview of the release:
>
> https://flink.apache.org/news/2022/01/07/release-ml-2.0.0.html
>
>
>
> The release is available for download at:
>
> https://flink.apache.org/downloads.html
>
>
>
> Maven artifacts for Flink ML can be found at:
>
> https://search.maven.org/search?q=g:org.apache.flink%20ml
>
>
>
> Python SDK for Flink ML published to the PyPI index can be found at:
>
> https://pypi.org/project/apache-flink-ml/
>
>
>
> The full release notes are available in Jira:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351079
>
>
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
>
>
> Regards,
>
> Dong and Yun
>


Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread David Morávek
Great job! <3 Thanks Dong and Yun for managing the release and big thanks
to everyone who has contributed!

Best,
D.

On Fri, Jan 7, 2022 at 2:27 PM Yun Gao  wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink ML 2.0.0.
>
>
>
> Apache Flink ML provides API and infrastructure that simplifies
> implementing distributed ML algorithms,
>
> and it also provides a library of off-the-shelf ML algorithms.
>
>
>
> Please check out the release blog post for an overview of the release:
>
> https://flink.apache.org/news/2022/01/07/release-ml-2.0.0.html
>
>
>
> The release is available for download at:
>
> https://flink.apache.org/downloads.html
>
>
>
> Maven artifacts for Flink ML can be found at:
>
> https://search.maven.org/search?q=g:org.apache.flink%20ml
>
>
>
> Python SDK for Flink ML published to the PyPI index can be found at:
>
> https://pypi.org/project/apache-flink-ml/
>
>
>
> The full release notes are available in Jira:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351079
>
>
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
>
>
> Regards,
>
> Dong and Yun
>


[ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread Yun Gao
The Apache Flink community is very happy to announce the release of Apache 
Flink ML 2.0.0.

Apache Flink ML provides API and infrastructure that simplifies implementing 
distributed ML algorithms, 
and it also provides a library of off-the-shelf ML algorithms.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/news/2022/01/07/release-ml-2.0.0.html

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink ML can be found at:
https://search.maven.org/search?q=g:org.apache.flink%20ml

Python SDK for Flink ML published to the PyPI index can be found at:
https://pypi.org/project/apache-flink-ml/

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351079

We would like to thank all contributors of the Apache Flink community who made 
this release possible!

Regards,
Dong and Yun

[ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread Yun Gao
The Apache Flink community is very happy to announce the release of Apache 
Flink ML 2.0.0.

Apache Flink ML provides API and infrastructure that simplifies implementing 
distributed ML algorithms, 
and it also provides a library of off-the-shelf ML algorithms.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/news/2022/01/07/release-ml-2.0.0.html

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink ML can be found at:
https://search.maven.org/search?q=g:org.apache.flink%20ml

Python SDK for Flink ML published to the PyPI index can be found at:
https://pypi.org/project/apache-flink-ml/

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351079

We would like to thank all contributors of the Apache Flink community who made 
this release possible!

Regards,
Dong and Yun

Re: Flink ML

2020-06-18 Thread Jark Wu
Currently, FLIP-39 is mainly driven by Becket and his team. I'm including
him, maybe he can answer your question.

Best,
Jark

On Wed, 17 Jun 2020 at 23:00, Piotr Nowojski  wrote:

> Hi,
>
> It looks like FLIP-39 is only partially implemented as for now [1], so I’m
> not sure which features are already done. I’m including Shaoxuan Wang in
> this thread, maybe he will be able to better answer your question.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-12470
>
> On 16 Jun 2020, at 14:55, Dimitris Vogiatzidakis <
> dimitrisvogiatzida...@gmail.com> wrote:
>
> Hello,
> I'm a cs student currently working on my Bachelor's thesis. I've used
> Flink to extract features out of some datasets, and I would like to use
> them together with another dataset of (1,0) (Node exists or doesn't) to
> perform a logistic regresssion. I have found that FLIP-39 has been accepted
> and it is running in version 1.10.0 that I also currently use, but I'm
> having trouble implementing it. Are there any java examples currently up
> and running? Or if you can propose a different way to perform the task?
> Thank you.
>
> -Dimitris Vogiatzidakis
>
>
>


Re: Flink ML

2020-06-17 Thread Piotr Nowojski
Hi,

It looks like FLIP-39 is only partially implemented as for now [1], so I’m not 
sure which features are already done. I’m including Shaoxuan Wang in this 
thread, maybe he will be able to better answer your question.

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-12470 


> On 16 Jun 2020, at 14:55, Dimitris Vogiatzidakis 
>  wrote:
> 
> Hello,
> I'm a cs student currently working on my Bachelor's thesis. I've used Flink 
> to extract features out of some datasets, and I would like to use them 
> together with another dataset of (1,0) (Node exists or doesn't) to perform a 
> logistic regresssion. I have found that FLIP-39 has been accepted and it is 
> running in version 1.10.0 that I also currently use, but I'm having trouble 
> implementing it. Are there any java examples currently up and running? Or if 
> you can propose a different way to perform the task? 
> Thank you.
> 
> -Dimitris Vogiatzidakis 



Flink ML

2020-06-16 Thread Dimitris Vogiatzidakis
 Hello,
I'm a cs student currently working on my Bachelor's thesis. I've used Flink
to extract features out of some datasets, and I would like to use them
together with another dataset of (1,0) (Node exists or doesn't) to perform
a logistic regresssion. I have found that FLIP-39 has been accepted and it
is running in version 1.10.0 that I also currently use, but I'm having
trouble implementing it. Are there any java examples currently up and
running? Or if you can propose a different way to perform the task?
Thank you.

-Dimitris Vogiatzidakis


Re: Alink and Flink ML

2020-03-09 Thread Flavio Pompermaier
Thanks Marta for the clarification!

On Mon, Mar 9, 2020 at 3:26 PM Marta Paes Moreira 
wrote:

> Hi, Flavio.
>
> Indeed, Becket is the best person to answer this question, but as far as I
> understand the idea is that Alink will be contributed back to Flink in the
> form of a refactored Flink ML library (sitting on top of the Table API)
> [1]. You can follow the progress of these efforts by tracking FLIP-39 [2].
>
> [1] https://developpaper.com/why-is-flink-ai-worth-looking-forward-to/
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>
> On Tue, Mar 3, 2020 at 2:02 PM Gary Yao  wrote:
>
>> Hi Flavio,
>>
>> I am looping in Becket (cc'ed) who might be able to answer your question.
>>
>> Best,
>> Gary
>>
>> On Tue, Mar 3, 2020 at 12:19 PM Flavio Pompermaier 
>> wrote:
>>
>>> Hi to all,
>>> since Alink has been open sourced, is there any good reason to keep both
>>> Flink ML and Alink?
>>> From what I understood Alink already contains the best ML implementation
>>> available for Flink..am I wrong?
>>> Maybe it could make sense to replace the current Flink ML with that of
>>> Alink..or is that impossible?
>>>
>>> Cheers,
>>> Flavio
>>>
>>


Re: Alink and Flink ML

2020-03-09 Thread Marta Paes Moreira
Hi, Flavio.

Indeed, Becket is the best person to answer this question, but as far as I
understand the idea is that Alink will be contributed back to Flink in the
form of a refactored Flink ML library (sitting on top of the Table API)
[1]. You can follow the progress of these efforts by tracking FLIP-39 [2].

[1] https://developpaper.com/why-is-flink-ai-worth-looking-forward-to/
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

On Tue, Mar 3, 2020 at 2:02 PM Gary Yao  wrote:

> Hi Flavio,
>
> I am looping in Becket (cc'ed) who might be able to answer your question.
>
> Best,
> Gary
>
> On Tue, Mar 3, 2020 at 12:19 PM Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> since Alink has been open sourced, is there any good reason to keep both
>> Flink ML and Alink?
>> From what I understood Alink already contains the best ML implementation
>> available for Flink..am I wrong?
>> Maybe it could make sense to replace the current Flink ML with that of
>> Alink..or is that impossible?
>>
>> Cheers,
>> Flavio
>>
>


Re: Alink and Flink ML

2020-03-03 Thread Gary Yao
Hi Flavio,

I am looping in Becket (cc'ed) who might be able to answer your question.

Best,
Gary

On Tue, Mar 3, 2020 at 12:19 PM Flavio Pompermaier 
wrote:

> Hi to all,
> since Alink has been open sourced, is there any good reason to keep both
> Flink ML and Alink?
> From what I understood Alink already contains the best ML implementation
> available for Flink..am I wrong?
> Maybe it could make sense to replace the current Flink ML with that of
> Alink..or is that impossible?
>
> Cheers,
> Flavio
>


Alink and Flink ML

2020-03-03 Thread Flavio Pompermaier
Hi to all,
since Alink has been open sourced, is there any good reason to keep both
Flink ML and Alink?
>From what I understood Alink already contains the best ML implementation
available for Flink..am I wrong?
Maybe it could make sense to replace the current Flink ML with that of
Alink..or is that impossible?

Cheers,
Flavio


Re: Flink ML feature

2019-12-12 Thread Rong Rong
Hi guys,

Yes, as Till mentioned. The community is working on a new ML library and we
are working closely with the Alink project to bring the algorithms.

You can find more information regarding the new ML design architecture in
FLIP-39 [1].
One of the major change is that the new ML library [2] will be based on the
Table API [3], instead of depending on the dataset/datastream API.

I've cc-ed @Xu Yang , who has been a major
contributor to the Alink project to provide more information.

--
Rong

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
[2] https://github.com/apache/flink/tree/master/flink-ml-parent
[3]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/tableApi.html

On Wed, Dec 11, 2019 at 1:11 AM Till Rohrmann  wrote:

> Hi guys,
>
> it is true that we dropped Flink-ML with 1.9. The reason is that the
> community started working on a new ML library which you can find under
> flink-ml-parent [1]. This module contains the framework for building ML
> pipelines but not yet too many algorithms iirc. The plan is to extend this
> library with algorithms from Alink in the near future to grow Flink's
> machine learning library.
>
> [1] https://github.com/apache/flink/tree/master/flink-ml-parent
>
> Cheers,
> Till
>
> On Wed, Dec 11, 2019 at 3:42 AM vino yang  wrote:
>
>> Hi Benoit,
>>
>> I can only try to ping @Till Rohrmann  @Kurt Young
>>   who may know more information to answer this
>> question.
>>
>> Best,
>> Vino
>>
>> Benoît Paris  于2019年12月10日周二
>> 下午7:06写道:
>>
>>> Is there any information as to whether Alink is going to be contributed
>>> to Apache Flink as the official ML Lib?
>>>
>>>
>>> On Tue, Dec 10, 2019 at 7:11 AM vino yang  wrote:
>>>
>>>> Hi Chandu,
>>>>
>>>> AFAIK, there is a project named Alink[1] which is the Machine Learning
>>>> algorithm platform based on Flink, developed by the PAI team of Alibaba
>>>> computing platform. FYI
>>>>
>>>> Best,
>>>> Vino
>>>>
>>>> [1]: https://github.com/alibaba/Alink
>>>>
>>>> Tom Blackwood  于2019年12月10日周二 下午2:07写道:
>>>>
>>>>> You may try Spark ML, which is a production ready library for ML stuff.
>>>>>
>>>>> regards.
>>>>>
>>>>> On Tue, Dec 10, 2019 at 1:04 PM chandu soa 
>>>>> wrote:
>>>>>
>>>>>> Hello Community,
>>>>>>
>>>>>> Can you please give me some pointers for implementing Machine
>>>>>> Learning using Flink.
>>>>>>
>>>>>> I see Flink ML libraries were dropped in v1.9. It looks like ML
>>>>>> feature in Flink going to be enhanced.
>>>>>>
>>>>>> What is the recommended approach for implementing production grade ML
>>>>>> based apps using Flink? v1.9 is ok?or should wait for 1.10?
>>>>>>
>>>>>> Thanks,
>>>>>> Chandu
>>>>>>
>>>>>
>>>
>>> --
>>> Benoît Paris
>>> Ingénieur Machine Learning Explicable
>>> Tél : +33 6 60 74 23 00
>>> http://benoit.paris
>>> http://explicable.ml
>>>
>>


Re: Flink ML feature

2019-12-11 Thread Till Rohrmann
Hi guys,

it is true that we dropped Flink-ML with 1.9. The reason is that the
community started working on a new ML library which you can find under
flink-ml-parent [1]. This module contains the framework for building ML
pipelines but not yet too many algorithms iirc. The plan is to extend this
library with algorithms from Alink in the near future to grow Flink's
machine learning library.

[1] https://github.com/apache/flink/tree/master/flink-ml-parent

Cheers,
Till

On Wed, Dec 11, 2019 at 3:42 AM vino yang  wrote:

> Hi Benoit,
>
> I can only try to ping @Till Rohrmann  @Kurt Young
>   who may know more information to answer this question.
>
> Best,
> Vino
>
> Benoît Paris  于2019年12月10日周二 下午7:06写道:
>
>> Is there any information as to whether Alink is going to be contributed
>> to Apache Flink as the official ML Lib?
>>
>>
>> On Tue, Dec 10, 2019 at 7:11 AM vino yang  wrote:
>>
>>> Hi Chandu,
>>>
>>> AFAIK, there is a project named Alink[1] which is the Machine Learning
>>> algorithm platform based on Flink, developed by the PAI team of Alibaba
>>> computing platform. FYI
>>>
>>> Best,
>>> Vino
>>>
>>> [1]: https://github.com/alibaba/Alink
>>>
>>> Tom Blackwood  于2019年12月10日周二 下午2:07写道:
>>>
>>>> You may try Spark ML, which is a production ready library for ML stuff.
>>>>
>>>> regards.
>>>>
>>>> On Tue, Dec 10, 2019 at 1:04 PM chandu soa  wrote:
>>>>
>>>>> Hello Community,
>>>>>
>>>>> Can you please give me some pointers for implementing Machine Learning
>>>>> using Flink.
>>>>>
>>>>> I see Flink ML libraries were dropped in v1.9. It looks like ML
>>>>> feature in Flink going to be enhanced.
>>>>>
>>>>> What is the recommended approach for implementing production grade ML
>>>>> based apps using Flink? v1.9 is ok?or should wait for 1.10?
>>>>>
>>>>> Thanks,
>>>>> Chandu
>>>>>
>>>>
>>
>> --
>> Benoît Paris
>> Ingénieur Machine Learning Explicable
>> Tél : +33 6 60 74 23 00
>> http://benoit.paris
>> http://explicable.ml
>>
>


Re: Flink ML feature

2019-12-10 Thread vino yang
Hi Benoit,

I can only try to ping @Till Rohrmann  @Kurt Young
  who may know more information to answer this question.

Best,
Vino

Benoît Paris  于2019年12月10日周二 下午7:06写道:

> Is there any information as to whether Alink is going to be contributed to
> Apache Flink as the official ML Lib?
>
>
> On Tue, Dec 10, 2019 at 7:11 AM vino yang  wrote:
>
>> Hi Chandu,
>>
>> AFAIK, there is a project named Alink[1] which is the Machine Learning
>> algorithm platform based on Flink, developed by the PAI team of Alibaba
>> computing platform. FYI
>>
>> Best,
>> Vino
>>
>> [1]: https://github.com/alibaba/Alink
>>
>> Tom Blackwood  于2019年12月10日周二 下午2:07写道:
>>
>>> You may try Spark ML, which is a production ready library for ML stuff.
>>>
>>> regards.
>>>
>>> On Tue, Dec 10, 2019 at 1:04 PM chandu soa  wrote:
>>>
>>>> Hello Community,
>>>>
>>>> Can you please give me some pointers for implementing Machine Learning
>>>> using Flink.
>>>>
>>>> I see Flink ML libraries were dropped in v1.9. It looks like ML feature
>>>> in Flink going to be enhanced.
>>>>
>>>> What is the recommended approach for implementing production grade ML
>>>> based apps using Flink? v1.9 is ok?or should wait for 1.10?
>>>>
>>>> Thanks,
>>>> Chandu
>>>>
>>>
>
> --
> Benoît Paris
> Ingénieur Machine Learning Explicable
> Tél : +33 6 60 74 23 00
> http://benoit.paris
> http://explicable.ml
>


Re: Flink ML feature

2019-12-10 Thread Benoît Paris
Is there any information as to whether Alink is going to be contributed to
Apache Flink as the official ML Lib?


On Tue, Dec 10, 2019 at 7:11 AM vino yang  wrote:

> Hi Chandu,
>
> AFAIK, there is a project named Alink[1] which is the Machine Learning
> algorithm platform based on Flink, developed by the PAI team of Alibaba
> computing platform. FYI
>
> Best,
> Vino
>
> [1]: https://github.com/alibaba/Alink
>
> Tom Blackwood  于2019年12月10日周二 下午2:07写道:
>
>> You may try Spark ML, which is a production ready library for ML stuff.
>>
>> regards.
>>
>> On Tue, Dec 10, 2019 at 1:04 PM chandu soa  wrote:
>>
>>> Hello Community,
>>>
>>> Can you please give me some pointers for implementing Machine Learning
>>> using Flink.
>>>
>>> I see Flink ML libraries were dropped in v1.9. It looks like ML feature
>>> in Flink going to be enhanced.
>>>
>>> What is the recommended approach for implementing production grade ML
>>> based apps using Flink? v1.9 is ok?or should wait for 1.10?
>>>
>>> Thanks,
>>> Chandu
>>>
>>

-- 
Benoît Paris
Ingénieur Machine Learning Explicable
Tél : +33 6 60 74 23 00
http://benoit.paris
http://explicable.ml


Re: Flink ML feature

2019-12-09 Thread vino yang
Hi Chandu,

AFAIK, there is a project named Alink[1] which is the Machine Learning
algorithm platform based on Flink, developed by the PAI team of Alibaba
computing platform. FYI

Best,
Vino

[1]: https://github.com/alibaba/Alink

Tom Blackwood  于2019年12月10日周二 下午2:07写道:

> You may try Spark ML, which is a production ready library for ML stuff.
>
> regards.
>
> On Tue, Dec 10, 2019 at 1:04 PM chandu soa  wrote:
>
>> Hello Community,
>>
>> Can you please give me some pointers for implementing Machine Learning
>> using Flink.
>>
>> I see Flink ML libraries were dropped in v1.9. It looks like ML feature
>> in Flink going to be enhanced.
>>
>> What is the recommended approach for implementing production grade ML
>> based apps using Flink? v1.9 is ok?or should wait for 1.10?
>>
>> Thanks,
>> Chandu
>>
>


Re: Flink ML feature

2019-12-09 Thread Tom Blackwood
You may try Spark ML, which is a production ready library for ML stuff.

regards.

On Tue, Dec 10, 2019 at 1:04 PM chandu soa  wrote:

> Hello Community,
>
> Can you please give me some pointers for implementing Machine Learning
> using Flink.
>
> I see Flink ML libraries were dropped in v1.9. It looks like ML feature in
> Flink going to be enhanced.
>
> What is the recommended approach for implementing production grade ML
> based apps using Flink? v1.9 is ok?or should wait for 1.10?
>
> Thanks,
> Chandu
>


Flink ML feature

2019-12-09 Thread chandu soa
Hello Community,

Can you please give me some pointers for implementing Machine Learning
using Flink.

I see Flink ML libraries were dropped in v1.9. It looks like ML feature in
Flink going to be enhanced.

What is the recommended approach for implementing production grade ML based
apps using Flink? v1.9 is ok?or should wait for 1.10?

Thanks,
Chandu


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-28 Thread Till Rohrmann
+1 for removing it. I think it is effectively dead by now.

Cheers,
Till

On Mon, May 27, 2019 at 4:00 PM Hequn Cheng  wrote:

> Hi Shaoxuan,
>
> Thanks a lot for driving this. +1 to remove the module.
>
> The git log of this module shows that it has been inactive for a long
> time. I think it's ok to remove it for now. It would also be good to switch
> to the new interface earlier.
>
> Best, Hequn
>
> On Mon, May 27, 2019 at 8:58 PM Becket Qin  wrote:
>
>> +1 for removal. Personally I'd prefer marking it as deprecated and remove
>> the module in the next release, just to follow the established procedure.
>>
>> And +1 on removing the `flink-libraries/flink-ml-uber` as well.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Mon, May 27, 2019 at 5:07 PM jincheng sun 
>> wrote:
>>
>>> +1 for remove it!
>>>
>>> And we also plan to delete the `flink-libraries/flink-ml-uber`, right?
>>>
>>> Best,
>>> Jincheng
>>>
>>> Rong Rong  于2019年5月24日周五 上午1:18写道:
>>>
>>>> +1 for the deletion.
>>>>
>>>> Also I think it also might be a good idea to update the roadmap for the
>>>> plan of removal/development since we've reached the consensus on FLIP-39.
>>>>
>>>> Thanks,
>>>> Rong
>>>>
>>>>
>>>> On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang 
>>>> wrote:
>>>>
>>>>> Hi Chesnay,
>>>>> Yes, you are right. There is not any active commit planned for the
>>>>> legacy Flink-ml package. It does not matter delete it now or later. I will
>>>>> open a PR and remove it.
>>>>>
>>>>> Shaoxuan
>>>>>
>>>>> On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler 
>>>>> wrote:
>>>>>
>>>>>> I believe we can remove it regardless since users could just use the
>>>>>> 1.8
>>>>>> version against future releases.
>>>>>>
>>>>>> Generally speaking, any library/connector that is no longer actively
>>>>>> developed can be removed from the project as existing users can
>>>>>> always
>>>>>> rely on previous versions, which should continue to work by virtue of
>>>>>> working against @Stable APIs.
>>>>>>
>>>>>> On 22/05/2019 12:08, Shaoxuan Wang wrote:
>>>>>> > Hi Flink community,
>>>>>> >
>>>>>> > We plan to delete/deprecate the legacy flink-libraries/flink-ml
>>>>>> package in
>>>>>> > Flink1.9, and replace it with the new flink-ml interface proposed
>>>>>> in FLIP39
>>>>>> > (FLINK-12470).
>>>>>> > Before we remove this package, I want to reach out to you and ask
>>>>>> if there
>>>>>> > is any active project still uses this package. Please respond to
>>>>>> this
>>>>>> > thread and outline how you use flink-libraries/flink-ml.
>>>>>> > Depending on the replies of activity and adoption
>>>>>> > of flink-libraries/flink-ml, we will decide to either delete this
>>>>>> package
>>>>>> > in Flink1.9 or deprecate it for now & remove it in the next release
>>>>>> after
>>>>>> > 1.9.
>>>>>> >
>>>>>> > Thanks for your attention and help!
>>>>>> >
>>>>>> > Regards,
>>>>>> > Shaoxuan
>>>>>> >
>>>>>>
>>>>>>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-27 Thread Hequn Cheng
Hi Shaoxuan,

Thanks a lot for driving this. +1 to remove the module.

The git log of this module shows that it has been inactive for a long time.
I think it's ok to remove it for now. It would also be good to switch to
the new interface earlier.

Best, Hequn

On Mon, May 27, 2019 at 8:58 PM Becket Qin  wrote:

> +1 for removal. Personally I'd prefer marking it as deprecated and remove
> the module in the next release, just to follow the established procedure.
>
> And +1 on removing the `flink-libraries/flink-ml-uber` as well.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, May 27, 2019 at 5:07 PM jincheng sun 
> wrote:
>
>> +1 for remove it!
>>
>> And we also plan to delete the `flink-libraries/flink-ml-uber`, right?
>>
>> Best,
>> Jincheng
>>
>> Rong Rong  于2019年5月24日周五 上午1:18写道:
>>
>>> +1 for the deletion.
>>>
>>> Also I think it also might be a good idea to update the roadmap for the
>>> plan of removal/development since we've reached the consensus on FLIP-39.
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang 
>>> wrote:
>>>
>>>> Hi Chesnay,
>>>> Yes, you are right. There is not any active commit planned for the
>>>> legacy Flink-ml package. It does not matter delete it now or later. I will
>>>> open a PR and remove it.
>>>>
>>>> Shaoxuan
>>>>
>>>> On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler 
>>>> wrote:
>>>>
>>>>> I believe we can remove it regardless since users could just use the
>>>>> 1.8
>>>>> version against future releases.
>>>>>
>>>>> Generally speaking, any library/connector that is no longer actively
>>>>> developed can be removed from the project as existing users can always
>>>>> rely on previous versions, which should continue to work by virtue of
>>>>> working against @Stable APIs.
>>>>>
>>>>> On 22/05/2019 12:08, Shaoxuan Wang wrote:
>>>>> > Hi Flink community,
>>>>> >
>>>>> > We plan to delete/deprecate the legacy flink-libraries/flink-ml
>>>>> package in
>>>>> > Flink1.9, and replace it with the new flink-ml interface proposed in
>>>>> FLIP39
>>>>> > (FLINK-12470).
>>>>> > Before we remove this package, I want to reach out to you and ask if
>>>>> there
>>>>> > is any active project still uses this package. Please respond to this
>>>>> > thread and outline how you use flink-libraries/flink-ml.
>>>>> > Depending on the replies of activity and adoption
>>>>> > of flink-libraries/flink-ml, we will decide to either delete this
>>>>> package
>>>>> > in Flink1.9 or deprecate it for now & remove it in the next release
>>>>> after
>>>>> > 1.9.
>>>>> >
>>>>> > Thanks for your attention and help!
>>>>> >
>>>>> > Regards,
>>>>> > Shaoxuan
>>>>> >
>>>>>
>>>>>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-27 Thread Becket Qin
+1 for removal. Personally I'd prefer marking it as deprecated and remove
the module in the next release, just to follow the established procedure.

And +1 on removing the `flink-libraries/flink-ml-uber` as well.

Thanks,

Jiangjie (Becket) Qin

On Mon, May 27, 2019 at 5:07 PM jincheng sun 
wrote:

> +1 for remove it!
>
> And we also plan to delete the `flink-libraries/flink-ml-uber`, right?
>
> Best,
> Jincheng
>
> Rong Rong  于2019年5月24日周五 上午1:18写道:
>
>> +1 for the deletion.
>>
>> Also I think it also might be a good idea to update the roadmap for the
>> plan of removal/development since we've reached the consensus on FLIP-39.
>>
>> Thanks,
>> Rong
>>
>>
>> On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang 
>> wrote:
>>
>>> Hi Chesnay,
>>> Yes, you are right. There is not any active commit planned for the
>>> legacy Flink-ml package. It does not matter delete it now or later. I will
>>> open a PR and remove it.
>>>
>>> Shaoxuan
>>>
>>> On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler 
>>> wrote:
>>>
>>>> I believe we can remove it regardless since users could just use the
>>>> 1.8
>>>> version against future releases.
>>>>
>>>> Generally speaking, any library/connector that is no longer actively
>>>> developed can be removed from the project as existing users can always
>>>> rely on previous versions, which should continue to work by virtue of
>>>> working against @Stable APIs.
>>>>
>>>> On 22/05/2019 12:08, Shaoxuan Wang wrote:
>>>> > Hi Flink community,
>>>> >
>>>> > We plan to delete/deprecate the legacy flink-libraries/flink-ml
>>>> package in
>>>> > Flink1.9, and replace it with the new flink-ml interface proposed in
>>>> FLIP39
>>>> > (FLINK-12470).
>>>> > Before we remove this package, I want to reach out to you and ask if
>>>> there
>>>> > is any active project still uses this package. Please respond to this
>>>> > thread and outline how you use flink-libraries/flink-ml.
>>>> > Depending on the replies of activity and adoption
>>>> > of flink-libraries/flink-ml, we will decide to either delete this
>>>> package
>>>> > in Flink1.9 or deprecate it for now & remove it in the next release
>>>> after
>>>> > 1.9.
>>>> >
>>>> > Thanks for your attention and help!
>>>> >
>>>> > Regards,
>>>> > Shaoxuan
>>>> >
>>>>
>>>>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-27 Thread jincheng sun
+1 for remove it!

And we also plan to delete the `flink-libraries/flink-ml-uber`, right?

Best,
Jincheng

Rong Rong  于2019年5月24日周五 上午1:18写道:

> +1 for the deletion.
>
> Also I think it also might be a good idea to update the roadmap for the
> plan of removal/development since we've reached the consensus on FLIP-39.
>
> Thanks,
> Rong
>
>
> On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang  wrote:
>
>> Hi Chesnay,
>> Yes, you are right. There is not any active commit planned for the legacy
>> Flink-ml package. It does not matter delete it now or later. I will open a
>> PR and remove it.
>>
>> Shaoxuan
>>
>> On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler 
>> wrote:
>>
>>> I believe we can remove it regardless since users could just use the 1.8
>>> version against future releases.
>>>
>>> Generally speaking, any library/connector that is no longer actively
>>> developed can be removed from the project as existing users can always
>>> rely on previous versions, which should continue to work by virtue of
>>> working against @Stable APIs.
>>>
>>> On 22/05/2019 12:08, Shaoxuan Wang wrote:
>>> > Hi Flink community,
>>> >
>>> > We plan to delete/deprecate the legacy flink-libraries/flink-ml
>>> package in
>>> > Flink1.9, and replace it with the new flink-ml interface proposed in
>>> FLIP39
>>> > (FLINK-12470).
>>> > Before we remove this package, I want to reach out to you and ask if
>>> there
>>> > is any active project still uses this package. Please respond to this
>>> > thread and outline how you use flink-libraries/flink-ml.
>>> > Depending on the replies of activity and adoption
>>> > of flink-libraries/flink-ml, we will decide to either delete this
>>> package
>>> > in Flink1.9 or deprecate it for now & remove it in the next release
>>> after
>>> > 1.9.
>>> >
>>> > Thanks for your attention and help!
>>> >
>>> > Regards,
>>> > Shaoxuan
>>> >
>>>
>>>


Re: Flink ML Use cases

2019-05-25 Thread Abhishek Singh
Thanks for the confirmation, Fabian.


*Regards,*
*Abhishek Kumar Singh*

*Search Engine Engineer*
*Mob :+91 7709735480 *


*...*


On Sat, May 25, 2019 at 8:55 PM Fabian Hueske  wrote:

> Hi Abhishek,
>
> Your observation is correct. Right now, the Flink ML module is in a
> half-baked state and is only supported in batch mode.
> It is not integrated with the DataStream API. FLIP-23 proposes a feature
> that allows to evaluated an externally trained model (stored as PMML) on a
> stream of data.
>
> There is another effort to implement a new machine learning API /
> environment based on the Table API. This will be supported for batch and
> streaming sources.
> However, this effort just started and the features is not available yet.
>
> Best, Fabian
>
> Am So., 19. Mai 2019 um 11:54 Uhr schrieb Abhishek Singh <
> asingh2...@gmail.com>:
>
>>
>> Thanks again for the above resources.
>>
>> I went through the project and also ran the example on my system to get a
>> grasp of the architecture.
>>
>> However, this project does not use Flink ML in it at all.
>>
>> Also, after having done enough research on Flink ML, I also found that it
>> does not let us persist the model, that's why I am not able to re-use the
>> model trained using Flink ML.
>>
>> It looks like Flink ML cannot really be used for real-life use cases as
>> it neither lets us persist the trained model, nor can it help us to use the
>> trained model on a *DataStream*.
>>
>> Please correct me if I am wrong.
>>
>>
>>
>>
>> *Regards,*
>> *Abhishek Kumar Singh*
>>
>> *Search Engine Engineer*
>> *Mob :+91 7709735480 *
>>
>>
>> *...*
>>
>>
>> On Wed, May 15, 2019 at 11:25 AM Abhishek Singh 
>> wrote:
>>
>>>
>>> Thanks a lot Rong and Sameer.
>>>
>>> Looks like this is what I wanted.
>>>
>>> I will try the above projects.
>>>
>>> *Regards,*
>>> *Abhishek Kumar Singh*
>>>
>>> *Search Engineer*
>>> *Mob :+91 7709735480 *
>>>
>>>
>>> *...*
>>>
>>>
>>> On Wed, May 15, 2019 at 8:00 AM Rong Rong  wrote:
>>>
>>>> Hi Abhishek,
>>>>
>>>> Based on your description, I think this FLIP proposal[1] seems to fit
>>>> perfectly for your use case.
>>>> you can also checkout the Github repo by Boris (CCed) for the PMML
>>>> implementation[2]. This proposal is still under development [3], you are
>>>> more than welcome to test out and share your feedbacks.
>>>>
>>>> Thanks,
>>>> Rong
>>>>
>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving
>>>> [2] https://github.com/FlinkML/flink-modelServer /
>>>> https://github.com/FlinkML/flink-speculative-modelServer
>>>> [3] https://github.com/apache/flink/pull/7446
>>>>
>>>> On Tue, May 14, 2019 at 4:44 PM Sameer Wadkar 
>>>> wrote:
>>>>
>>>>> If you can save the model as a PMML file you can apply it on a stream
>>>>> using one of the java pmml libraries.
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On May 14, 2019, at 4:44 PM, Abhishek Singh 
>>>>> wrote:
>>>>>
>>>>> I was looking forward to using Flink ML for my project where I think I
>>>>> can use SVM.
>>>>>
>>>>> I have been able to run a bath job using flink ML and trained and
>>>>> tested my data.
>>>>>
>>>>> Now I want to do the following:-
>>>>> 1. Applying the above-trained model to a stream of events from Kafka
>>>>> (Using Data Streams) :For this, I want to know if Flink ML can be used
>>>>> with Data Streams.
>>>>>
>>>>> 2. Persisting the model: I may want to save the trained model for some
>>>>> time future.
>>>>>
>>>>> Can the above 2 use cases be achieved using Apache Flink?
>>>>>
>>>>> *Regards,*
>>>>> *Abhishek Kumar Singh*
>>>>>
>>>>> *Search Engineer*
>>>>> *Mob :+91 7709735480 *
>>>>>
>>>>>
>>>>> *...*
>>>>>
>>>>>


Re: Flink ML Use cases

2019-05-25 Thread Fabian Hueske
Hi Abhishek,

Your observation is correct. Right now, the Flink ML module is in a
half-baked state and is only supported in batch mode.
It is not integrated with the DataStream API. FLIP-23 proposes a feature
that allows to evaluated an externally trained model (stored as PMML) on a
stream of data.

There is another effort to implement a new machine learning API /
environment based on the Table API. This will be supported for batch and
streaming sources.
However, this effort just started and the features is not available yet.

Best, Fabian

Am So., 19. Mai 2019 um 11:54 Uhr schrieb Abhishek Singh <
asingh2...@gmail.com>:

>
> Thanks again for the above resources.
>
> I went through the project and also ran the example on my system to get a
> grasp of the architecture.
>
> However, this project does not use Flink ML in it at all.
>
> Also, after having done enough research on Flink ML, I also found that it
> does not let us persist the model, that's why I am not able to re-use the
> model trained using Flink ML.
>
> It looks like Flink ML cannot really be used for real-life use cases as it
> neither lets us persist the trained model, nor can it help us to use the
> trained model on a *DataStream*.
>
> Please correct me if I am wrong.
>
>
>
>
> *Regards,*
> *Abhishek Kumar Singh*
>
> *Search Engine Engineer*
> *Mob :+91 7709735480 *
>
>
> *...*
>
>
> On Wed, May 15, 2019 at 11:25 AM Abhishek Singh 
> wrote:
>
>>
>> Thanks a lot Rong and Sameer.
>>
>> Looks like this is what I wanted.
>>
>> I will try the above projects.
>>
>> *Regards,*
>> *Abhishek Kumar Singh*
>>
>> *Search Engineer*
>> *Mob :+91 7709735480 *
>>
>>
>> *...*
>>
>>
>> On Wed, May 15, 2019 at 8:00 AM Rong Rong  wrote:
>>
>>> Hi Abhishek,
>>>
>>> Based on your description, I think this FLIP proposal[1] seems to fit
>>> perfectly for your use case.
>>> you can also checkout the Github repo by Boris (CCed) for the PMML
>>> implementation[2]. This proposal is still under development [3], you are
>>> more than welcome to test out and share your feedbacks.
>>>
>>> Thanks,
>>> Rong
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving
>>> [2] https://github.com/FlinkML/flink-modelServer /
>>> https://github.com/FlinkML/flink-speculative-modelServer
>>> [3] https://github.com/apache/flink/pull/7446
>>>
>>> On Tue, May 14, 2019 at 4:44 PM Sameer Wadkar 
>>> wrote:
>>>
>>>> If you can save the model as a PMML file you can apply it on a stream
>>>> using one of the java pmml libraries.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 14, 2019, at 4:44 PM, Abhishek Singh 
>>>> wrote:
>>>>
>>>> I was looking forward to using Flink ML for my project where I think I
>>>> can use SVM.
>>>>
>>>> I have been able to run a bath job using flink ML and trained and
>>>> tested my data.
>>>>
>>>> Now I want to do the following:-
>>>> 1. Applying the above-trained model to a stream of events from Kafka
>>>> (Using Data Streams) :For this, I want to know if Flink ML can be used
>>>> with Data Streams.
>>>>
>>>> 2. Persisting the model: I may want to save the trained model for some
>>>> time future.
>>>>
>>>> Can the above 2 use cases be achieved using Apache Flink?
>>>>
>>>> *Regards,*
>>>> *Abhishek Kumar Singh*
>>>>
>>>> *Search Engineer*
>>>> *Mob :+91 7709735480 *
>>>>
>>>>
>>>> *...*
>>>>
>>>>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-23 Thread Rong Rong
+1 for the deletion.

Also I think it also might be a good idea to update the roadmap for the
plan of removal/development since we've reached the consensus on FLIP-39.

Thanks,
Rong


On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang  wrote:

> Hi Chesnay,
> Yes, you are right. There is not any active commit planned for the legacy
> Flink-ml package. It does not matter delete it now or later. I will open a
> PR and remove it.
>
> Shaoxuan
>
> On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler 
> wrote:
>
>> I believe we can remove it regardless since users could just use the 1.8
>> version against future releases.
>>
>> Generally speaking, any library/connector that is no longer actively
>> developed can be removed from the project as existing users can always
>> rely on previous versions, which should continue to work by virtue of
>> working against @Stable APIs.
>>
>> On 22/05/2019 12:08, Shaoxuan Wang wrote:
>> > Hi Flink community,
>> >
>> > We plan to delete/deprecate the legacy flink-libraries/flink-ml package
>> in
>> > Flink1.9, and replace it with the new flink-ml interface proposed in
>> FLIP39
>> > (FLINK-12470).
>> > Before we remove this package, I want to reach out to you and ask if
>> there
>> > is any active project still uses this package. Please respond to this
>> > thread and outline how you use flink-libraries/flink-ml.
>> > Depending on the replies of activity and adoption
>> > of flink-libraries/flink-ml, we will decide to either delete this
>> package
>> > in Flink1.9 or deprecate it for now & remove it in the next release
>> after
>> > 1.9.
>> >
>> > Thanks for your attention and help!
>> >
>> > Regards,
>> > Shaoxuan
>> >
>>
>>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-22 Thread Shaoxuan Wang
Hi Chesnay,
Yes, you are right. There is not any active commit planned for the legacy
Flink-ml package. It does not matter delete it now or later. I will open a
PR and remove it.

Shaoxuan

On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler  wrote:

> I believe we can remove it regardless since users could just use the 1.8
> version against future releases.
>
> Generally speaking, any library/connector that is no longer actively
> developed can be removed from the project as existing users can always
> rely on previous versions, which should continue to work by virtue of
> working against @Stable APIs.
>
> On 22/05/2019 12:08, Shaoxuan Wang wrote:
> > Hi Flink community,
> >
> > We plan to delete/deprecate the legacy flink-libraries/flink-ml package
> in
> > Flink1.9, and replace it with the new flink-ml interface proposed in
> FLIP39
> > (FLINK-12470).
> > Before we remove this package, I want to reach out to you and ask if
> there
> > is any active project still uses this package. Please respond to this
> > thread and outline how you use flink-libraries/flink-ml.
> > Depending on the replies of activity and adoption
> > of flink-libraries/flink-ml, we will decide to either delete this package
> > in Flink1.9 or deprecate it for now & remove it in the next release after
> > 1.9.
> >
> > Thanks for your attention and help!
> >
> > Regards,
> > Shaoxuan
> >
>
>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-22 Thread Chesnay Schepler
I believe we can remove it regardless since users could just use the 1.8 
version against future releases.


Generally speaking, any library/connector that is no longer actively 
developed can be removed from the project as existing users can always 
rely on previous versions, which should continue to work by virtue of 
working against @Stable APIs.


On 22/05/2019 12:08, Shaoxuan Wang wrote:

Hi Flink community,

We plan to delete/deprecate the legacy flink-libraries/flink-ml package in
Flink1.9, and replace it with the new flink-ml interface proposed in FLIP39
(FLINK-12470).
Before we remove this package, I want to reach out to you and ask if there
is any active project still uses this package. Please respond to this
thread and outline how you use flink-libraries/flink-ml.
Depending on the replies of activity and adoption
of flink-libraries/flink-ml, we will decide to either delete this package
in Flink1.9 or deprecate it for now & remove it in the next release after
1.9.

Thanks for your attention and help!

Regards,
Shaoxuan





[SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-22 Thread Shaoxuan Wang
Hi Flink community,

We plan to delete/deprecate the legacy flink-libraries/flink-ml package in
Flink1.9, and replace it with the new flink-ml interface proposed in FLIP39
(FLINK-12470).
Before we remove this package, I want to reach out to you and ask if there
is any active project still uses this package. Please respond to this
thread and outline how you use flink-libraries/flink-ml.
Depending on the replies of activity and adoption
of flink-libraries/flink-ml, we will decide to either delete this package
in Flink1.9 or deprecate it for now & remove it in the next release after
1.9.

Thanks for your attention and help!

Regards,
Shaoxuan


Re: Flink ML Use cases

2019-05-19 Thread Abhishek Singh
Thanks again for the above resources.

I went through the project and also ran the example on my system to get a
grasp of the architecture.

However, this project does not use Flink ML in it at all.

Also, after having done enough research on Flink ML, I also found that it
does not let us persist the model, that's why I am not able to re-use the
model trained using Flink ML.

It looks like Flink ML cannot really be used for real-life use cases as it
neither lets us persist the trained model, nor can it help us to use the
trained model on a *DataStream*.

Please correct me if I am wrong.




*Regards,*
*Abhishek Kumar Singh*

*Search Engine Engineer*
*Mob :+91 7709735480 *


*...*


On Wed, May 15, 2019 at 11:25 AM Abhishek Singh 
wrote:

>
> Thanks a lot Rong and Sameer.
>
> Looks like this is what I wanted.
>
> I will try the above projects.
>
> *Regards,*
> *Abhishek Kumar Singh*
>
> *Search Engineer*
> *Mob :+91 7709735480 *
>
>
> *...*
>
>
> On Wed, May 15, 2019 at 8:00 AM Rong Rong  wrote:
>
>> Hi Abhishek,
>>
>> Based on your description, I think this FLIP proposal[1] seems to fit
>> perfectly for your use case.
>> you can also checkout the Github repo by Boris (CCed) for the PMML
>> implementation[2]. This proposal is still under development [3], you are
>> more than welcome to test out and share your feedbacks.
>>
>> Thanks,
>> Rong
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving
>> [2] https://github.com/FlinkML/flink-modelServer /
>> https://github.com/FlinkML/flink-speculative-modelServer
>> [3] https://github.com/apache/flink/pull/7446
>>
>> On Tue, May 14, 2019 at 4:44 PM Sameer Wadkar 
>> wrote:
>>
>>> If you can save the model as a PMML file you can apply it on a stream
>>> using one of the java pmml libraries.
>>>
>>> Sent from my iPhone
>>>
>>> On May 14, 2019, at 4:44 PM, Abhishek Singh 
>>> wrote:
>>>
>>> I was looking forward to using Flink ML for my project where I think I
>>> can use SVM.
>>>
>>> I have been able to run a bath job using flink ML and trained and tested
>>> my data.
>>>
>>> Now I want to do the following:-
>>> 1. Applying the above-trained model to a stream of events from Kafka
>>> (Using Data Streams) :For this, I want to know if Flink ML can be used
>>> with Data Streams.
>>>
>>> 2. Persisting the model: I may want to save the trained model for some
>>> time future.
>>>
>>> Can the above 2 use cases be achieved using Apache Flink?
>>>
>>> *Regards,*
>>> *Abhishek Kumar Singh*
>>>
>>> *Search Engineer*
>>> *Mob :+91 7709735480 *
>>>
>>>
>>> *...*
>>>
>>>


Re: Flink ML Use cases

2019-05-15 Thread Abhishek Singh
Thanks a lot Rong and Sameer.

Looks like this is what I wanted.

I will try the above projects.

*Regards,*
*Abhishek Kumar Singh*

*Search Engineer*
*Mob :+91 7709735480 *


*...*


On Wed, May 15, 2019 at 8:00 AM Rong Rong  wrote:

> Hi Abhishek,
>
> Based on your description, I think this FLIP proposal[1] seems to fit
> perfectly for your use case.
> you can also checkout the Github repo by Boris (CCed) for the PMML
> implementation[2]. This proposal is still under development [3], you are
> more than welcome to test out and share your feedbacks.
>
> Thanks,
> Rong
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving
> [2] https://github.com/FlinkML/flink-modelServer /
> https://github.com/FlinkML/flink-speculative-modelServer
> [3] https://github.com/apache/flink/pull/7446
>
> On Tue, May 14, 2019 at 4:44 PM Sameer Wadkar  wrote:
>
>> If you can save the model as a PMML file you can apply it on a stream
>> using one of the java pmml libraries.
>>
>> Sent from my iPhone
>>
>> On May 14, 2019, at 4:44 PM, Abhishek Singh  wrote:
>>
>> I was looking forward to using Flink ML for my project where I think I
>> can use SVM.
>>
>> I have been able to run a bath job using flink ML and trained and tested
>> my data.
>>
>> Now I want to do the following:-
>> 1. Applying the above-trained model to a stream of events from Kafka
>> (Using Data Streams) :For this, I want to know if Flink ML can be used
>> with Data Streams.
>>
>> 2. Persisting the model: I may want to save the trained model for some
>> time future.
>>
>> Can the above 2 use cases be achieved using Apache Flink?
>>
>> *Regards,*
>> *Abhishek Kumar Singh*
>>
>> *Search Engineer*
>> *Mob :+91 7709735480 *
>>
>>
>> *...*
>>
>>


Re: Flink ML Use cases

2019-05-14 Thread Rong Rong
Hi Abhishek,

Based on your description, I think this FLIP proposal[1] seems to fit
perfectly for your use case.
you can also checkout the Github repo by Boris (CCed) for the PMML
implementation[2]. This proposal is still under development [3], you are
more than welcome to test out and share your feedbacks.

Thanks,
Rong

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving
[2] https://github.com/FlinkML/flink-modelServer /
https://github.com/FlinkML/flink-speculative-modelServer
[3] https://github.com/apache/flink/pull/7446

On Tue, May 14, 2019 at 4:44 PM Sameer Wadkar  wrote:

> If you can save the model as a PMML file you can apply it on a stream
> using one of the java pmml libraries.
>
> Sent from my iPhone
>
> On May 14, 2019, at 4:44 PM, Abhishek Singh  wrote:
>
> I was looking forward to using Flink ML for my project where I think I can
> use SVM.
>
> I have been able to run a bath job using flink ML and trained and tested
> my data.
>
> Now I want to do the following:-
> 1. Applying the above-trained model to a stream of events from Kafka
> (Using Data Streams) :For this, I want to know if Flink ML can be used
> with Data Streams.
>
> 2. Persisting the model: I may want to save the trained model for some
> time future.
>
> Can the above 2 use cases be achieved using Apache Flink?
>
> *Regards,*
> *Abhishek Kumar Singh*
>
> *Search Engineer*
> *Mob :+91 7709735480 *
>
>
> *...*
>
>


Re: Flink ML Use cases

2019-05-14 Thread Sameer Wadkar
If you can save the model as a PMML file you can apply it on a stream using one 
of the java pmml libraries. 

Sent from my iPhone

> On May 14, 2019, at 4:44 PM, Abhishek Singh  wrote:
> 
> I was looking forward to using Flink ML for my project where I think I can 
> use SVM.
> 
> I have been able to run a bath job using flink ML and trained and tested my 
> data.
> 
> Now I want to do the following:-  
> 1. Applying the above-trained model to a stream of events from Kafka  (Using 
> Data Streams) :For this, I want to know if Flink ML can be used with Data 
> Streams.
> 
> 2. Persisting the model: I may want to save the trained model for some time 
> future.
> 
> Can the above 2 use cases be achieved using Apache Flink? 
> 
> Regards,
> Abhishek Kumar Singh
> Search Engineer
> Mob :+91 7709735480 
> 
> 
> ...


Flink ML Use cases

2019-05-14 Thread Abhishek Singh
I was looking forward to using Flink ML for my project where I think I can
use SVM.

I have been able to run a bath job using flink ML and trained and tested my
data.

Now I want to do the following:-
1. Applying the above-trained model to a stream of events from Kafka
(Using Data Streams) :For this, I want to know if Flink ML can be used
with Data Streams.

2. Persisting the model: I may want to save the trained model for some time
future.

Can the above 2 use cases be achieved using Apache Flink?

*Regards,*
*Abhishek Kumar Singh*

*Search Engineer*
*Mob :+91 7709735480 *


*...*


Re: Random forest - Flink ML

2019-03-12 Thread Benoît Paris
There has been some developments at  Apache SAMOA
   for a forest of decision trees.

This is not regular Random Forest, but a form of trees that can be
incrementally learned fast. If I recall correctly they also have adaptive
algorithms as well. Here are some resources:

*  VHT: Vertical Hoeffding Tree   

*  Apache SAMOA   

Now I don't know the status of the project nor have I tried them, nor have I
ever tried SAMOA; but this is something that could fit your needs.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Random forest - Flink ML

2019-03-12 Thread Avi Levi
Thanks Flavio,
I will definitely check it out. But from a quick glance , it seems that it
is missing implementation of "random forest" which is something that we are
looking for .
If anyone can recommend/suggest/share that will be greatly appreciated.

Best Regards
Avi


On Mon, Mar 11, 2019 at 10:01 PM Flavio Pompermaier 
wrote:

> I know there's an outgoing promising effort on improving Flink ML in the
> Streamline project [1] but I don't know why it's not very
> considered/advertised.
>
> Best,
> Flavio
>
> [1] https://h2020-streamline-project.eu/apache-flink/
>
> Il Lun 11 Mar 2019, 15:40 Avi Levi  ha scritto:
>
>> HI ,
>>  According to Tills comment
>> <https://issues.apache.org/jira/browse/FLINK-1728?focusedCommentId=16780468=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16780468>
>> I understand that flink-ml is going to be ditched. What will be the
>> alternative ?
>> Looking for a "random forest" method that we can add to our pipeline
>> (scala). any suggestions?
>>
>> Thanks
>> Avi
>>
>>
>>
>>


Re: Random forest - Flink ML

2019-03-11 Thread Flavio Pompermaier
I know there's an outgoing promising effort on improving Flink ML in the
Streamline project [1] but I don't know why it's not very
considered/advertised.

Best,
Flavio

[1] https://h2020-streamline-project.eu/apache-flink/

Il Lun 11 Mar 2019, 15:40 Avi Levi  ha scritto:

> HI ,
>  According to Tills comment
> <https://issues.apache.org/jira/browse/FLINK-1728?focusedCommentId=16780468=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16780468>
> I understand that flink-ml is going to be ditched. What will be the
> alternative ?
> Looking for a "random forest" method that we can add to our pipeline
> (scala). any suggestions?
>
> Thanks
> Avi
>
>
>
>


Random forest - Flink ML

2019-03-11 Thread Avi Levi
HI ,
 According to Tills comment
<https://issues.apache.org/jira/browse/FLINK-1728?focusedCommentId=16780468=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16780468>
I understand that flink-ml is going to be ditched. What will be the
alternative ?
Looking for a "random forest" method that we can add to our pipeline
(scala). any suggestions?

Thanks
Avi


Re: Using Flink Ml with DataStream

2017-11-03 Thread Adarsh Jain
Hi Chesnay,

Thanks for the reply, do you know how to serve using the trained model?

Where is the model saved?

Regards,
Adarsh



‌

On Wed, Nov 1, 2017 at 4:46 PM, Chesnay Schepler  wrote:

> I don't believe this to be possible. The ML library works exclusively with
> the Batch API.
>
>
> On 30.10.2017 12:52, Adarsh Jain wrote:
>
>
> Hi,
>
> Is there a way to use Stochastic Outlier Selection (SOS) and/or SVM using
> CoCoA with streaming data.
>
> Please suggest and give pointers.
>
> Regards,
> Adarsh
>
> ‌
>
>
>


Re: Using Flink Ml with DataStream

2017-11-01 Thread Chesnay Schepler
I don't believe this to be possible. The ML library works exclusively 
with the Batch API.


On 30.10.2017 12:52, Adarsh Jain wrote:


Hi,

Is there a way to use Stochastic Outlier Selection (SOS) and/or SVM 
using CoCoA with streaming data.


Please suggest and give pointers.

Regards,
Adarsh

‌





Using Flink Ml with DataStream

2017-10-30 Thread Adarsh Jain
Hi,

Is there a way to use Stochastic Outlier Selection (SOS) and/or SVM using
CoCoA with streaming data.

Please suggest and give pointers.

Regards,
Adarsh

‌


Re: Flink ML with DataStream

2017-07-21 Thread Fabian Hueske
Hi Jeremy,

here are a few links about the recent efforts for ML on streams with Flink:

- Discussion on the dev mailing list [1]
- Announcement of a Slack channel [2]
- GDocs Design Doc [3]

IMO, anomaly detection is a great use case for ML on streams.

Cheers, Fabian

[1]
https://lists.apache.org/thread.html/638fdee0c361a7fb362e050e8cc79ba1e8b4162b044bcbcca31d31ed@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/e2a1f974300bf1f1b3ff19317a6b7fc941ebedd013950307959cf830@%3Cdev.flink.apache.org%3E
[3]
https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw

2017-07-21 21:57 GMT+02:00 Branham, Jeremy [IT] <jeremy.d.bran...@sprint.com
>:

> Thanks Fabian –
>
> I’m interested in the early development of ML on streams.
>
> Harshith and I plan on doing some prototyping for NRT anomaly detection
> leveraging the stream API.
>
> It would be great if we could produce something reusable for the community.
>
>
>
>
>
> *From:* Fabian Hueske [mailto:fhue...@gmail.com]
> *Sent:* Wednesday, July 19, 2017 2:12 PM
> *To:* Branham, Jeremy [IT] <jeremy.d.bran...@sprint.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Flink ML with DataStream
>
>
>
> Hi,
>
> unfortunately, it is not possible to convert a DataStream into a DataSet.
>
> Flink's DataSet and DataStream APIs are distinct APIs that cannot be used
> together.
>
>
> The FlinkML library is only available for the DataSet API.
> There is some ongoing work to add a machine learning library for streaming
> use cases as well, but this is still in an early stage and mostly focusing
> on model serving on streams, i.e, applying an externally trained model on
> streaming data.
>
> Best, Fabian
>
>
>
>
>
> 2017-07-19 19:07 GMT+02:00 Branham, Jeremy [IT] <
> jeremy.d.bran...@sprint.com>:
>
> Hello –
>
> I’ve been successful working with Flink in Java, but have some trouble
> trying to leverage the ML library, specifically with KNN.
>
> From my understanding, this is easier in Scala [1] so I’ve been converting
> my code.
>
>
>
> One issue I’ve encountered is – How do I get a DataSet[Vector] from a
> DataStream[MyClass]?
>
> I’ve attempted to use windowing, but scala is completely new to me and I
> may need a push in the right direction.
>
>
>
> The below code executes properly, I’m just unsure of the next step.
>
>
>
>
>
> I’ve also seen an example [2] that looks like something I need to
> implement – especially the PartialModelBuilder.
>
> Am I on the right track?
>
> Thoughts?
>
>
>
> Thanks!
>
>
>
>
>
> [1] - https://stackoverflow.com/questions/44039857/is-there-a-
> apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F44039857%2Fis-there-a-apache-flink-machine-learning-tutorial-in-java-language%2F44040819%2344040819=02%7C01%7CJeremy.D.Branham%40sprint.com%7Ca4cddcbaad9843dacf8f08d4ceda095d%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636360883235855952=tqod8bLAlECIJFU7xJbiedYCJSaA4znLECcmTKQAZM8%3D=0>
>
> [2] - https://github.com/apache/flink/blob/master/flink-
> examples/flink-examples-streaming/src/main/scala/org/
> apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fflink%2Fblob%2Fmaster%2Fflink-examples%2Fflink-examples-streaming%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fflink%2Fstreaming%2Fscala%2Fexamples%2Fml%2FIncrementalLearningSkeleton.scala=02%7C01%7CJeremy.D.Branham%40sprint.com%7Ca4cddcbaad9843dacf8f08d4ceda095d%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636360883235865966=etFHVGjXsdc1PYRRca7n%2FBSWVm6J8BOmE%2FHKqra2Gss%3D=0>
>
>
>
>
>
>
>
> Jeremy D. Branham
>
> Technology Architect - Sprint
> O: +1 (972) 405-2970 <(972)%20405-2970> | M: +1 (817) 791-1627
> <(817)%20791-1627>
>
> jeremy.d.bran...@sprint.com
>
> #gettingbettereveryday
>
>
>
>
> --
>
>
> This e-mail may contain Sprint proprietary information intended for the
> sole use of the recipient(s). Any use by others is prohibited. If you are
> not the intended recipient, please contact the sender and delete all copies
> of the message.
>
>
>


RE: Flink ML with DataStream

2017-07-21 Thread Branham, Jeremy [IT]
Thanks Fabian –
I’m interested in the early development of ML on streams.
Harshith and I plan on doing some prototyping for NRT anomaly detection 
leveraging the stream API.
It would be great if we could produce something reusable for the community.


From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Wednesday, July 19, 2017 2:12 PM
To: Branham, Jeremy [IT] <jeremy.d.bran...@sprint.com>
Cc: user@flink.apache.org
Subject: Re: Flink ML with DataStream

Hi,
unfortunately, it is not possible to convert a DataStream into a DataSet.
Flink's DataSet and DataStream APIs are distinct APIs that cannot be used 
together.

The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming use 
cases as well, but this is still in an early stage and mostly focusing on model 
serving on streams, i.e, applying an externally trained model on streaming data.
Best, Fabian


2017-07-19 19:07 GMT+02:00 Branham, Jeremy [IT] 
<jeremy.d.bran...@sprint.com<mailto:jeremy.d.bran...@sprint.com>>:
Hello –
I’ve been successful working with Flink in Java, but have some trouble trying 
to leverage the ML library, specifically with KNN.
From my understanding, this is easier in Scala [1] so I’ve been converting my 
code.

One issue I’ve encountered is – How do I get a DataSet[Vector] from a 
DataStream[MyClass]?
I’ve attempted to use windowing, but scala is completely new to me and I may 
need a push in the right direction.

The below code executes properly, I’m just unsure of the next step.

[cid:image001.png@01D30231.89869CF0]

I’ve also seen an example [2] that looks like something I need to implement – 
especially the PartialModelBuilder.
Am I on the right track?
Thoughts?

Thanks!


[1] - 
https://stackoverflow.com/questions/44039857/is-there-a-apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F44039857%2Fis-there-a-apache-flink-machine-learning-tutorial-in-java-language%2F44040819%2344040819=02%7C01%7CJeremy.D.Branham%40sprint.com%7Ca4cddcbaad9843dacf8f08d4ceda095d%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636360883235855952=tqod8bLAlECIJFU7xJbiedYCJSaA4znLECcmTKQAZM8%3D=0>
[2] - 
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fflink%2Fblob%2Fmaster%2Fflink-examples%2Fflink-examples-streaming%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fflink%2Fstreaming%2Fscala%2Fexamples%2Fml%2FIncrementalLearningSkeleton.scala=02%7C01%7CJeremy.D.Branham%40sprint.com%7Ca4cddcbaad9843dacf8f08d4ceda095d%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636360883235865966=etFHVGjXsdc1PYRRca7n%2FBSWVm6J8BOmE%2FHKqra2Gss%3D=0>



Jeremy D. Branham
Technology Architect - Sprint
O: +1 (972) 405-2970<tel:(972)%20405-2970> | M: +1 (817) 
791-1627<tel:(817)%20791-1627>
jeremy.d.bran...@sprint.com<mailto:jeremy.d.bran...@sprint.com>
#gettingbettereveryday




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



Re: Flink ML with DataStream

2017-07-19 Thread Fabian Hueske
Hi,

unfortunately, it is not possible to convert a DataStream into a DataSet.
Flink's DataSet and DataStream APIs are distinct APIs that cannot be used
together.

The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming
use cases as well, but this is still in an early stage and mostly focusing
on model serving on streams, i.e, applying an externally trained model on
streaming data.

Best, Fabian


2017-07-19 19:07 GMT+02:00 Branham, Jeremy [IT] :

> Hello –
>
> I’ve been successful working with Flink in Java, but have some trouble
> trying to leverage the ML library, specifically with KNN.
>
> From my understanding, this is easier in Scala [1] so I’ve been converting
> my code.
>
>
>
> One issue I’ve encountered is – How do I get a DataSet[Vector] from a
> DataStream[MyClass]?
>
> I’ve attempted to use windowing, but scala is completely new to me and I
> may need a push in the right direction.
>
>
>
> The below code executes properly, I’m just unsure of the next step.
>
>
>
>
>
> I’ve also seen an example [2] that looks like something I need to
> implement – especially the PartialModelBuilder.
>
> Am I on the right track?
>
> Thoughts?
>
>
>
> Thanks!
>
>
>
>
>
> [1] - https://stackoverflow.com/questions/44039857/is-there-a-
> apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819
>
> [2] - https://github.com/apache/flink/blob/master/flink-
> examples/flink-examples-streaming/src/main/scala/org/
> apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala
>
>
>
>
>
>
>
> Jeremy D. Branham
>
> Technology Architect - Sprint
> O: +1 (972) 405-2970 <(972)%20405-2970> | M: +1 (817) 791-1627
> <(817)%20791-1627>
>
> jeremy.d.bran...@sprint.com
>
> #gettingbettereveryday
>
>
>
> --
>
> This e-mail may contain Sprint proprietary information intended for the
> sole use of the recipient(s). Any use by others is prohibited. If you are
> not the intended recipient, please contact the sender and delete all copies
> of the message.
>


Flink ML with DataStream

2017-07-19 Thread Branham, Jeremy [IT]
Hello -
I've been successful working with Flink in Java, but have some trouble trying 
to leverage the ML library, specifically with KNN.
>From my understanding, this is easier in Scala [1] so I've been converting my 
>code.

One issue I've encountered is - How do I get a DataSet[Vector] from a 
DataStream[MyClass]?
I've attempted to use windowing, but scala is completely new to me and I may 
need a push in the right direction.

The below code executes properly, I'm just unsure of the next step.

[cid:image001.png@01D30086.8C42C4C0]

I've also seen an example [2] that looks like something I need to implement - 
especially the PartialModelBuilder.
Am I on the right track?
Thoughts?

Thanks!


[1] - 
https://stackoverflow.com/questions/44039857/is-there-a-apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819
[2] - 
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala



Jeremy D. Branham
Technology Architect - Sprint
O: +1 (972) 405-2970 | M: +1 (817) 791-1627
jeremy.d.bran...@sprint.com
#gettingbettereveryday




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Anomaly Detection with Flink-ML

2017-07-07 Thread Branham, Jeremy [IT]
Great information!
Thanks Jonas!



-- Original message--
From: Jonas Gröger
Date: Fri, Jul 7, 2017 4:12 PM
To: user@flink.apache.org;
Cc:
Subject:Re: Anomaly Detection with Flink-ML

Hello Jeremy,

it looks like what you are looking for is map (1 in, 1 out) / flatmap (1 in,
0-n out) for preprocessing on a single element basis as well as windows for
looking at related MetricDefinition elements calculating some result.

I suggest you look into Windows
(https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.3%2Fdev%2Fwindows.html=02%7C01%7CJeremy.D.Branham%40sprint.com%7C66a475a8840f40aee51808d4c57cbe82%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636350587286223292=NoSCU2%2F3hcpLpwUCHdlFYrzA2dNraUuSMWlRUJ3H9vI%3D=0)
and basic transformations
(https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.3%2Fdev%2Fdatastream_api.html%23datastream-transformations=02%7C01%7CJeremy.D.Branham%40sprint.com%7C66a475a8840f40aee51808d4c57cbe82%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636350587286223292=KHnHRq9MSKTsRQJV4oUIAHJMHoxLlfaWWDCFqKsO95Y%3D=0).

Regards,
Jonas



--
View this message in context: 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2FAnomaly-Detection-with-Flink-ML-tp14149p14151.html=02%7C01%7CJeremy.D.Branham%40sprint.com%7C66a475a8840f40aee51808d4c57cbe82%7C4f8bc0acbd784bf5b55f1b31301d9adf%7C0%7C0%7C636350587286223292=2PRn8Gb7vOAoCT7r91J4ewhTaVauSEXK380vZp4YV2Q%3D=0
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.



This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Anomaly Detection with Flink-ML

2017-07-07 Thread Jonas Gröger
Hello Jeremy,

it looks like what you are looking for is map (1 in, 1 out) / flatmap (1 in,
0-n out) for preprocessing on a single element basis as well as windows for
looking at related MetricDefinition elements calculating some result.

I suggest you look into Windows
(https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html)
and basic transformations
(https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#datastream-transformations).

Regards,
Jonas



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Anomaly-Detection-with-Flink-ML-tp14149p14151.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Anomaly Detection with Flink-ML

2017-07-07 Thread Branham, Jeremy [IT]
Hello -
I'm working on an anomaly detector for some time series monitoring data.
I've setup an example project with Flink that reads from Kafka to get the 
monitoring data.
Unfortunately, I'm not sure what to do next.

The goal is to perform some clustering on the metric values that Flink is 
receiving and detect when the value[s] are anomalous.
I've got a DataStream  that I think needs to go through some 
pre-processing, like transforming it into a vector, but I'm not sure how to 
proceed.

The pojo[MetricDefinition] looks like this -

https://github.com/savantly-net/metric-schema/blob/master/src/main/java/net/savantly/metrics/schema/MetricDefinition.java

Can anyone point me in the right direction?

Thanks!


Jeremy D. Branham
Technology Architect - Sprint
O: +1 (972) 405-2970 | M: +1 (817) 791-1627
jeremy.d.bran...@sprint.com
#gettingbettereveryday




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-18 Thread KirstiLaurila
Answering to myself if someone is having similar problems. So already saved
matrices can be read and used in als like this:


// Setup the ALS learnerd
val als = ALS()

val users  = env.readFile(new
TypeSerializerInputFormat[Factors](createTypeInformation[Factors]),"path")
val items = env.readFile(new
TypeSerializerInputFormat[Factors](createTypeInformation[Factors]),"path")


als.factorsOption = Option(users,items)

After this, one can use als for prediction.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-ML-1-0-0-Saving-and-Loading-Models-to-Score-a-Single-Feature-Vector-tp5766p6167.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-12 Thread KirstiLaurila
Hi, 

those parts were examples how I had tried. I tried with your suggestions,
but still no success. Additionally, 
there were some problems: 


val (userFactorsOpt, itemFactorsOpt) = als.factorsOption 

If I had just this, userFactorsOpt And itemFactorsOpt did not have write
method. So I added get there i.e.

val (userFactorsOpt, itemFactorsOpt) = als.factorsOption.get 


val factorsTypeInfo = TypeInformation.of(classOf[Factors])
val factorsSerializer = factorsTypeInfo.createSerializer(new
ExecutionConfig())
val outputFormat = new TypeSerializerOutputFormat[Factors]


Here, the factorsSerializer was not used at all, so I guess this was missing
line 

outputFormat.setSerializer(factorsSerializer)


userFactorsOpt match {
case Some(userFactors) => userFactors.write(outputFormat, "user_path")
case None =>
}


This doesn't run because of error message 

Error:(71, 12) constructor cannot be instantiated to expected type;
 found   : Some[A]
 required:
org.apache.flink.api.scala.DataSet[org.apache.flink.ml.recommendation.ALS.Factors]
  case Some(userFactors) => userFactorsOpt.write(outputFormat,
"path_to_my_file")

However, I still tried not to have match case i.e.

userFactorsOpt.write(outputFormat, "path")

but nothing was written anywhere.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-ML-1-0-0-Saving-and-Loading-Models-to-Score-a-Single-Feature-Vector-tp5766p6059.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-12 Thread Till Rohrmann
Serializable {}
> >>>>>
> >>>>>
> >>>>> However, I will use the approach to write out the weights as text.
> >>>>>
> >>>>>
> >>>>> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann 
>
> > trohrmann@
>
> > 
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Gna,
> >>>>>>
> >>>>>> there are no utilities yet to do that but you can do it manually. In
> >>>>>> the end, a model is simply a Flink DataSet which you can serialize
> to
> >>>>>> some file. Upon reading this DataSet you simply have to give it to
> >>>>>> your algorithm to be used as the model. The following code snippet
> >>>>>> illustrates this approach:
> >>>>>>
> >>>>>> mlr.fit(inputDS, parameters)
> >>>>>>
> >>>>>> // write model to disk using the SerializedOutputFormat
> >>>>>> mlr.weightsOption.get.write(new
> SerializedOutputFormat[WeightVector],
> >>>>>> "path")
> >>>>>>
> >>>>>> // read the serialized model from disk
> >>>>>> val model = env.readFile(new SerializedInputFormat[WeightVector],
> >>>>>> "path")
> >>>>>>
> >>>>>> // set the read model for the MLR algorithm
> >>>>>> mlr.weightsOption = model
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Till
> >>>>>> ​
> >>>>>>
> >>>>>> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
> >>>>>>
>
> > simone.robutti@
>
> >> wrote:
> >>>>>>
> >>>>>>> To my knowledge there is nothing like that. PMML is not supported
> in
> >>>>>>> any form and there's no custom saving format yet. If you really
> need
> >>>>>>> a
> >>>>>>> quick and dirty solution, it's not that hard to serialize the model
> >>>>>>> into a
> >>>>>>> file.
> >>>>>>>
> >>>>>>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
> >>>>>>>
>
> > gna.phetsarath@
>
> >>:
> >>>>>>>
> >>>>>>>> Flinksters,
> >>>>>>>>
> >>>>>>>> Is there an example of saving a Trained Model, loading a Trained
> >>>>>>>> Model and then scoring one or more feature vectors using Flink ML?
> >>>>>>>>
> >>>>>>>> All of the examples I've seen have shown only sequential fit and
> >>>>>>>> predict.
> >>>>>>>>
> >>>>>>>> Thank you.
> >>>>>>>>
> >>>>>>>> -Gna
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services
> >>>>>>>> // Applied Research Chapter
> >>>>>>>> 770 Broadway, 5th Floor, New York, NY 10003
> >>>>>>>> o: 212.402.4871 // m: 917.373.7363
> >>>>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
> >>>>>>>>
> >>>>>>>> * http://www.aolplatforms.com*
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>>
> >>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
> >>>>> Applied Research Chapter
> >>>>> 770 Broadway, 5th Floor, New York, NY 10003
> >>>>> o: 212.402.4871 // m: 917.373.7363
> >>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
> >>>>>
> >>>>> * http://www.aolplatforms.com*
> >>>>>
> >>>>
> >>>>
> >>>
> >>
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-ML-1-0-0-Saving-and-Loading-Models-to-Score-a-Single-Feature-Vector-tp5766p6056.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-12 Thread KirstiLaurila
How should this be done for the recommendation engine (that is ALS, example
here 
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/libs/ml/als.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/libs/ml/als.html>
 
).

 I am able to run the example with my example data but cannot get anything
written to any file (user or item matrices). 

Basically, I have tried something like this




Tried also to apply similar approach than this 



but with no success. Could someone help me with this to get my model saved?


Best,
Kirsti



Trevor Grant wrote
> I'm just about to open an issue / PR solution for 'warm-starts'
> 
> Once this is in, we could just add a setter for the weight vector (and
> what
> ever iteration you're on if you're going to do more partial fits).
> 
> Then all you need to save if your weight vector (and iter number).
> 
> 
> 
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
> 
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> 
> 
> On Fri, Apr 8, 2016 at 9:04 AM, Behrouz Derakhshan <

> behrouz.derakhshan@

>> wrote:
> 
>> Is there a reasons the Predictor or Estimator class don't have read and
>> write methods for saving and retrieving the model? I couldn't find Jira
>> issues for it. Does it make sense to create one ?
>>
>> BR,
>> Behrouz
>>
>> On Wed, Mar 30, 2016 at 4:40 PM, Till Rohrmann 

> trohrmann@

> 
>> wrote:
>>
>>> Yes Suneel is completely wright. If the data does not implement
>>> IOReadableWritable it is probably easier to use the
>>> TypeSerializerOutputFormat. What you need here to seralize the data is a
>>> TypeSerializer. You can obtain it the following way:
>>>
>>> val model = mlr.weightsOption.get
>>>
>>> val weightVectorTypeInfo = TypeInformation.of(classOf[WeightVector])
>>> val weightVectorSerializer = weightVectorTypeInfo.createSerializer(new
>>> ExecutionConfig())
>>> val outputFormat = new TypeSerializerOutputFormat[WeightVector]
>>> outputFormat.setSerializer(weightVectorSerializer)
>>>
>>> model.write(outputFormat, "path")
>>>
>>> Cheers,
>>> Till
>>> ​
>>>
>>> On Tue, Mar 29, 2016 at 8:22 PM, Suneel Marthi 

> smarthi@

> 
>>> wrote:
>>>
>>>> U may want to use FlinkMLTools.persist() methods which use
>>>> TypeSerializerFormat and don't enforce IOReadableWritable.
>>>>
>>>>
>>>>
>>>> On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath <
>>>> 

> gna.phetsarath@

>> wrote:
>>>>
>>>>> Till,
>>>>>
>>>>> Thank you for your reply.
>>>>>
>>>>> Having this issue though, WeightVector does not extend
>>>>> IOReadWriteable:
>>>>>
>>>>> *public* *class* SerializedOutputFormat<*T* *extends*
>>>>> IOReadableWritable>
>>>>>
>>>>> *case* *class* WeightVector(weights: Vector, intercept: Double)
>>>>> *extends* Serializable {}
>>>>>
>>>>>
>>>>> However, I will use the approach to write out the weights as text.
>>>>>
>>>>>
>>>>> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann 

> trohrmann@

> 
>>>>> wrote:
>>>>>
>>>>>> Hi Gna,
>>>>>>
>>>>>> there are no utilities yet to do that but you can do it manually. In
>>>>>> the end, a model is simply a Flink DataSet which you can serialize to
>>>>>> some file. Upon reading this DataSet you simply have to give it to
>>>>>> your algorithm to be used as the model. The following code snippet
>>>>>> illustrates this approach:
>>>>>>
>>>>>> mlr.fit(inputDS, parameters)
>>>>>>
>>>>>> // write model to disk using the SerializedOutputFormat
>>>>>> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector],
>>>>>> "path")
>>>>>>
>>>>>> // read the serialized model from disk
>>>>>> val model = env.readFile(new SerializedInputFormat[WeightVector],
>>>>>> "path")
>>>>>>
>>>>>> // set the read model for the MLR algorithm
>>>>>>

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-08 Thread Trevor Grant
I'm just about to open an issue / PR solution for 'warm-starts'

Once this is in, we could just add a setter for the weight vector (and what
ever iteration you're on if you're going to do more partial fits).

Then all you need to save if your weight vector (and iter number).



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Apr 8, 2016 at 9:04 AM, Behrouz Derakhshan <
behrouz.derakhs...@gmail.com> wrote:

> Is there a reasons the Predictor or Estimator class don't have read and
> write methods for saving and retrieving the model? I couldn't find Jira
> issues for it. Does it make sense to create one ?
>
> BR,
> Behrouz
>
> On Wed, Mar 30, 2016 at 4:40 PM, Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Yes Suneel is completely wright. If the data does not implement
>> IOReadableWritable it is probably easier to use the
>> TypeSerializerOutputFormat. What you need here to seralize the data is a
>> TypeSerializer. You can obtain it the following way:
>>
>> val model = mlr.weightsOption.get
>>
>> val weightVectorTypeInfo = TypeInformation.of(classOf[WeightVector])
>> val weightVectorSerializer = weightVectorTypeInfo.createSerializer(new 
>> ExecutionConfig())
>> val outputFormat = new TypeSerializerOutputFormat[WeightVector]
>> outputFormat.setSerializer(weightVectorSerializer)
>>
>> model.write(outputFormat, "path")
>>
>> Cheers,
>> Till
>> ​
>>
>> On Tue, Mar 29, 2016 at 8:22 PM, Suneel Marthi <smar...@apache.org>
>> wrote:
>>
>>> U may want to use FlinkMLTools.persist() methods which use
>>> TypeSerializerFormat and don't enforce IOReadableWritable.
>>>
>>>
>>>
>>> On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath <
>>> gna.phetsar...@teamaol.com> wrote:
>>>
>>>> Till,
>>>>
>>>> Thank you for your reply.
>>>>
>>>> Having this issue though, WeightVector does not extend IOReadWriteable:
>>>>
>>>> *public* *class* SerializedOutputFormat<*T* *extends*
>>>> IOReadableWritable>
>>>>
>>>> *case* *class* WeightVector(weights: Vector, intercept: Double)
>>>> *extends* Serializable {}
>>>>
>>>>
>>>> However, I will use the approach to write out the weights as text.
>>>>
>>>>
>>>> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Gna,
>>>>>
>>>>> there are no utilities yet to do that but you can do it manually. In
>>>>> the end, a model is simply a Flink DataSet which you can serialize to
>>>>> some file. Upon reading this DataSet you simply have to give it to
>>>>> your algorithm to be used as the model. The following code snippet
>>>>> illustrates this approach:
>>>>>
>>>>> mlr.fit(inputDS, parameters)
>>>>>
>>>>> // write model to disk using the SerializedOutputFormat
>>>>> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], 
>>>>> "path")
>>>>>
>>>>> // read the serialized model from disk
>>>>> val model = env.readFile(new SerializedInputFormat[WeightVector], "path")
>>>>>
>>>>> // set the read model for the MLR algorithm
>>>>> mlr.weightsOption = model
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>> ​
>>>>>
>>>>> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
>>>>> simone.robu...@radicalbit.io> wrote:
>>>>>
>>>>>> To my knowledge there is nothing like that. PMML is not supported in
>>>>>> any form and there's no custom saving format yet. If you really need a
>>>>>> quick and dirty solution, it's not that hard to serialize the model into 
>>>>>> a
>>>>>> file.
>>>>>>
>>>>>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
>>>>>> gna.phetsar...@teamaol.com>:
>>>>>>
>>>>>>> Flinksters,
>>>>>>>
>>>>>>> Is there an example of saving a Trained Model, loading a Trained
>>>>>>> Model and then scoring one or more feature vectors using Flink ML?
>>>>>>>
>>>>>>> All of the examples I've seen have shown only sequential fit and
>>>>>>> predict.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> -Gna
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services
>>>>>>> // Applied Research Chapter
>>>>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>>>>> o: 212.402.4871 // m: 917.373.7363
>>>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>>>>
>>>>>>> * <http://www.aolplatforms.com>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>>> Applied Research Chapter
>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>> o: 212.402.4871 // m: 917.373.7363
>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>
>>>> * <http://www.aolplatforms.com>*
>>>>
>>>
>>>
>>
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-08 Thread Behrouz Derakhshan
Is there a reasons the Predictor or Estimator class don't have read and
write methods for saving and retrieving the model? I couldn't find Jira
issues for it. Does it make sense to create one ?

BR,
Behrouz

On Wed, Mar 30, 2016 at 4:40 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> Yes Suneel is completely wright. If the data does not implement
> IOReadableWritable it is probably easier to use the
> TypeSerializerOutputFormat. What you need here to seralize the data is a
> TypeSerializer. You can obtain it the following way:
>
> val model = mlr.weightsOption.get
>
> val weightVectorTypeInfo = TypeInformation.of(classOf[WeightVector])
> val weightVectorSerializer = weightVectorTypeInfo.createSerializer(new 
> ExecutionConfig())
> val outputFormat = new TypeSerializerOutputFormat[WeightVector]
> outputFormat.setSerializer(weightVectorSerializer)
>
> model.write(outputFormat, "path")
>
> Cheers,
> Till
> ​
>
> On Tue, Mar 29, 2016 at 8:22 PM, Suneel Marthi <smar...@apache.org> wrote:
>
>> U may want to use FlinkMLTools.persist() methods which use
>> TypeSerializerFormat and don't enforce IOReadableWritable.
>>
>>
>>
>> On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath <
>> gna.phetsar...@teamaol.com> wrote:
>>
>>> Till,
>>>
>>> Thank you for your reply.
>>>
>>> Having this issue though, WeightVector does not extend IOReadWriteable:
>>>
>>> *public* *class* SerializedOutputFormat<*T* *extends* IOReadableWritable
>>> >
>>>
>>> *case* *class* WeightVector(weights: Vector, intercept: Double)
>>> *extends* Serializable {}
>>>
>>>
>>> However, I will use the approach to write out the weights as text.
>>>
>>>
>>> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann <trohrm...@apache.org>
>>> wrote:
>>>
>>>> Hi Gna,
>>>>
>>>> there are no utilities yet to do that but you can do it manually. In
>>>> the end, a model is simply a Flink DataSet which you can serialize to
>>>> some file. Upon reading this DataSet you simply have to give it to
>>>> your algorithm to be used as the model. The following code snippet
>>>> illustrates this approach:
>>>>
>>>> mlr.fit(inputDS, parameters)
>>>>
>>>> // write model to disk using the SerializedOutputFormat
>>>> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], 
>>>> "path")
>>>>
>>>> // read the serialized model from disk
>>>> val model = env.readFile(new SerializedInputFormat[WeightVector], "path")
>>>>
>>>> // set the read model for the MLR algorithm
>>>> mlr.weightsOption = model
>>>>
>>>> Cheers,
>>>> Till
>>>> ​
>>>>
>>>> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
>>>> simone.robu...@radicalbit.io> wrote:
>>>>
>>>>> To my knowledge there is nothing like that. PMML is not supported in
>>>>> any form and there's no custom saving format yet. If you really need a
>>>>> quick and dirty solution, it's not that hard to serialize the model into a
>>>>> file.
>>>>>
>>>>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
>>>>> gna.phetsar...@teamaol.com>:
>>>>>
>>>>>> Flinksters,
>>>>>>
>>>>>> Is there an example of saving a Trained Model, loading a Trained
>>>>>> Model and then scoring one or more feature vectors using Flink ML?
>>>>>>
>>>>>> All of the examples I've seen have shown only sequential fit and
>>>>>> predict.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> -Gna
>>>>>> --
>>>>>>
>>>>>>
>>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services
>>>>>> // Applied Research Chapter
>>>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>>>> o: 212.402.4871 // m: 917.373.7363
>>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>>>
>>>>>> * <http://www.aolplatforms.com>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>> Applied Research Chapter
>>> 770 Broadway, 5th Floor, New York, NY 10003
>>> o: 212.402.4871 // m: 917.373.7363
>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>
>>> * <http://www.aolplatforms.com>*
>>>
>>
>>
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-30 Thread Till Rohrmann
Yes Suneel is completely wright. If the data does not implement
IOReadableWritable it is probably easier to use the
TypeSerializerOutputFormat. What you need here to seralize the data is a
TypeSerializer. You can obtain it the following way:

val model = mlr.weightsOption.get

val weightVectorTypeInfo = TypeInformation.of(classOf[WeightVector])
val weightVectorSerializer = weightVectorTypeInfo.createSerializer(new
ExecutionConfig())
val outputFormat = new TypeSerializerOutputFormat[WeightVector]
outputFormat.setSerializer(weightVectorSerializer)

model.write(outputFormat, "path")

Cheers,
Till
​

On Tue, Mar 29, 2016 at 8:22 PM, Suneel Marthi <smar...@apache.org> wrote:

> U may want to use FlinkMLTools.persist() methods which use
> TypeSerializerFormat and don't enforce IOReadableWritable.
>
>
>
> On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath <
> gna.phetsar...@teamaol.com> wrote:
>
>> Till,
>>
>> Thank you for your reply.
>>
>> Having this issue though, WeightVector does not extend IOReadWriteable:
>>
>> *public* *class* SerializedOutputFormat<*T* *extends* IOReadableWritable>
>>
>>
>> *case* *class* WeightVector(weights: Vector, intercept: Double) *extends*
>> Serializable {}
>>
>>
>> However, I will use the approach to write out the weights as text.
>>
>>
>> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> Hi Gna,
>>>
>>> there are no utilities yet to do that but you can do it manually. In the
>>> end, a model is simply a Flink DataSet which you can serialize to some
>>> file. Upon reading this DataSet you simply have to give it to your
>>> algorithm to be used as the model. The following code snippet illustrates
>>> this approach:
>>>
>>> mlr.fit(inputDS, parameters)
>>>
>>> // write model to disk using the SerializedOutputFormat
>>> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], 
>>> "path")
>>>
>>> // read the serialized model from disk
>>> val model = env.readFile(new SerializedInputFormat[WeightVector], "path")
>>>
>>> // set the read model for the MLR algorithm
>>> mlr.weightsOption = model
>>>
>>> Cheers,
>>> Till
>>> ​
>>>
>>> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
>>> simone.robu...@radicalbit.io> wrote:
>>>
>>>> To my knowledge there is nothing like that. PMML is not supported in
>>>> any form and there's no custom saving format yet. If you really need a
>>>> quick and dirty solution, it's not that hard to serialize the model into a
>>>> file.
>>>>
>>>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
>>>> gna.phetsar...@teamaol.com>:
>>>>
>>>>> Flinksters,
>>>>>
>>>>> Is there an example of saving a Trained Model, loading a Trained Model
>>>>> and then scoring one or more feature vectors using Flink ML?
>>>>>
>>>>> All of the examples I've seen have shown only sequential fit and
>>>>> predict.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> -Gna
>>>>> --
>>>>>
>>>>>
>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>>>> Applied Research Chapter
>>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>>> o: 212.402.4871 // m: 917.373.7363
>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>>
>>>>> * <http://www.aolplatforms.com>*
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>> Applied Research Chapter
>> 770 Broadway, 5th Floor, New York, NY 10003
>> o: 212.402.4871 // m: 917.373.7363
>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>
>> * <http://www.aolplatforms.com>*
>>
>
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Suneel Marthi
U may want to use FlinkMLTools.persist() methods which use
TypeSerializerFormat and don't enforce IOReadableWritable.



On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath <
gna.phetsar...@teamaol.com> wrote:

> Till,
>
> Thank you for your reply.
>
> Having this issue though, WeightVector does not extend IOReadWriteable:
>
> *public* *class* SerializedOutputFormat<*T* *extends* IOReadableWritable>
>
> *case* *class* WeightVector(weights: Vector, intercept: Double) *extends*
> Serializable {}
>
>
> However, I will use the approach to write out the weights as text.
>
>
> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Hi Gna,
>>
>> there are no utilities yet to do that but you can do it manually. In the
>> end, a model is simply a Flink DataSet which you can serialize to some
>> file. Upon reading this DataSet you simply have to give it to your
>> algorithm to be used as the model. The following code snippet illustrates
>> this approach:
>>
>> mlr.fit(inputDS, parameters)
>>
>> // write model to disk using the SerializedOutputFormat
>> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], "path")
>>
>> // read the serialized model from disk
>> val model = env.readFile(new SerializedInputFormat[WeightVector], "path")
>>
>> // set the read model for the MLR algorithm
>> mlr.weightsOption = model
>>
>> Cheers,
>> Till
>> ​
>>
>> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
>> simone.robu...@radicalbit.io> wrote:
>>
>>> To my knowledge there is nothing like that. PMML is not supported in any
>>> form and there's no custom saving format yet. If you really need a quick
>>> and dirty solution, it's not that hard to serialize the model into a file.
>>>
>>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
>>> gna.phetsar...@teamaol.com>:
>>>
>>>> Flinksters,
>>>>
>>>> Is there an example of saving a Trained Model, loading a Trained Model
>>>> and then scoring one or more feature vectors using Flink ML?
>>>>
>>>> All of the examples I've seen have shown only sequential fit and
>>>> predict.
>>>>
>>>> Thank you.
>>>>
>>>> -Gna
>>>> --
>>>>
>>>>
>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>>> Applied Research Chapter
>>>> 770 Broadway, 5th Floor, New York, NY 10003
>>>> o: 212.402.4871 // m: 917.373.7363
>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>>
>>>> * <http://www.aolplatforms.com>*
>>>>
>>>
>>>
>>
>
>
> --
>
>
> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
> Applied Research Chapter
> 770 Broadway, 5th Floor, New York, NY 10003
> o: 212.402.4871 // m: 917.373.7363
> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>
> * <http://www.aolplatforms.com>*
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Sourigna Phetsarath
Till,

Thank you for your reply.

Having this issue though, WeightVector does not extend IOReadWriteable:

*public* *class* SerializedOutputFormat<*T* *extends* IOReadableWritable>

*case* *class* WeightVector(weights: Vector, intercept: Double) *extends*
Serializable {}


However, I will use the approach to write out the weights as text.


On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Gna,
>
> there are no utilities yet to do that but you can do it manually. In the
> end, a model is simply a Flink DataSet which you can serialize to some
> file. Upon reading this DataSet you simply have to give it to your
> algorithm to be used as the model. The following code snippet illustrates
> this approach:
>
> mlr.fit(inputDS, parameters)
>
> // write model to disk using the SerializedOutputFormat
> mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], "path")
>
> // read the serialized model from disk
> val model = env.readFile(new SerializedInputFormat[WeightVector], "path")
>
> // set the read model for the MLR algorithm
> mlr.weightsOption = model
>
> Cheers,
> Till
> ​
>
> On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
> simone.robu...@radicalbit.io> wrote:
>
>> To my knowledge there is nothing like that. PMML is not supported in any
>> form and there's no custom saving format yet. If you really need a quick
>> and dirty solution, it's not that hard to serialize the model into a file.
>>
>> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <
>> gna.phetsar...@teamaol.com>:
>>
>>> Flinksters,
>>>
>>> Is there an example of saving a Trained Model, loading a Trained Model
>>> and then scoring one or more feature vectors using Flink ML?
>>>
>>> All of the examples I've seen have shown only sequential fit and predict.
>>>
>>> Thank you.
>>>
>>> -Gna
>>> --
>>>
>>>
>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>>> Applied Research Chapter
>>> 770 Broadway, 5th Floor, New York, NY 10003
>>> o: 212.402.4871 // m: 917.373.7363
>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>>
>>> * <http://www.aolplatforms.com>*
>>>
>>
>>
>


-- 


*Gna Phetsarath*System Architect // AOL Platforms // Data Services //
Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 aim: sphetsarath20 t: @sourigna

* <http://www.aolplatforms.com>*


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Till Rohrmann
Hi Gna,

there are no utilities yet to do that but you can do it manually. In the
end, a model is simply a Flink DataSet which you can serialize to some
file. Upon reading this DataSet you simply have to give it to your
algorithm to be used as the model. The following code snippet illustrates
this approach:

mlr.fit(inputDS, parameters)

// write model to disk using the SerializedOutputFormat
mlr.weightsOption.get.write(new SerializedOutputFormat[WeightVector], "path")

// read the serialized model from disk
val model = env.readFile(new SerializedInputFormat[WeightVector], "path")

// set the read model for the MLR algorithm
mlr.weightsOption = model

Cheers,
Till
​

On Tue, Mar 29, 2016 at 10:46 AM, Simone Robutti <
simone.robu...@radicalbit.io> wrote:

> To my knowledge there is nothing like that. PMML is not supported in any
> form and there's no custom saving format yet. If you really need a quick
> and dirty solution, it's not that hard to serialize the model into a file.
>
> 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <gna.phetsar...@teamaol.com
> >:
>
>> Flinksters,
>>
>> Is there an example of saving a Trained Model, loading a Trained Model
>> and then scoring one or more feature vectors using Flink ML?
>>
>> All of the examples I've seen have shown only sequential fit and predict.
>>
>> Thank you.
>>
>> -Gna
>> --
>>
>>
>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
>> Applied Research Chapter
>> 770 Broadway, 5th Floor, New York, NY 10003
>> o: 212.402.4871 // m: 917.373.7363
>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>>
>> * <http://www.aolplatforms.com>*
>>
>
>


Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Simone Robutti
To my knowledge there is nothing like that. PMML is not supported in any
form and there's no custom saving format yet. If you really need a quick
and dirty solution, it's not that hard to serialize the model into a file.

2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath <gna.phetsar...@teamaol.com>:

> Flinksters,
>
> Is there an example of saving a Trained Model, loading a Trained Model and
> then scoring one or more feature vectors using Flink ML?
>
> All of the examples I've seen have shown only sequential fit and predict.
>
> Thank you.
>
> -Gna
> --
>
>
> *Gna Phetsarath*System Architect // AOL Platforms // Data Services //
> Applied Research Chapter
> 770 Broadway, 5th Floor, New York, NY 10003
> o: 212.402.4871 // m: 917.373.7363
> vvmr: 8890237 aim: sphetsarath20 t: @sourigna
>
> * <http://www.aolplatforms.com>*
>


Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-28 Thread Sourigna Phetsarath
Flinksters,

Is there an example of saving a Trained Model, loading a Trained Model and
then scoring one or more feature vectors using Flink ML?

All of the examples I've seen have shown only sequential fit and predict.

Thank you.

-Gna
-- 


*Gna Phetsarath*System Architect // AOL Platforms // Data Services //
Applied Research Chapter
770 Broadway, 5th Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr: 8890237 aim: sphetsarath20 t: @sourigna

* <http://www.aolplatforms.com>*


Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Anwar Rizal
Yeah

I had similar problems with kafka in spark streaming. I worked around the
problem by excluding kafka from connector and then adding the library back.

Maybe you can try something like:

libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
"org.apache.flink" % "flink-clients" % "0.9.1" ,"org.apache.flink" %
"flink-ml" % "0.9.1"  exclude("org.scalanlp",
"breeze_${scala.binary.version}"))

libraryDependencies += "org.scalanlp" % "breeze_2.10" % "0.11.2"

Anwar.




On Wed, Oct 28, 2015 at 11:29 AM, Frederick Ayala <frederickay...@gmail.com>
wrote:

> I tried adding libraryDependencies += "org.scalanlp" % "breeze_2.10" %
> "0.11.2"  but the problem persist.
>
> I also tried as explained in the Breeze documentation:
>
> libraryDependencies  ++= Seq(
>   "org.scalanlp" %% "breeze" % "0.11.2",
>   "org.scalanlp" %% "breeze-natives" % "0.11.2",
>   "org.scalanlp" %% "breeze-viz" % "0.11.2"
> )
>
> resolvers ++= Seq("Sonatype Releases" at "
> https://oss.sonatype.org/content/repositories/releases/;)
>
> But it doesn't work.
>
> The message is still "unresolved dependency:
> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found"
>
> Could the problem be on flink-ml/pom.xml?
>
> 
> org.scalanlp
> breeze_${scala.binary.version}
> 0.11.2
> 
>
> The property scala.binary.version is not being replaced by the value 2.10
>
> Thanks,
>
> Frederick Ayala
>
> On Wed, Oct 28, 2015 at 10:59 AM, DEVAN M.S. <msdeva...@gmail.com> wrote:
>
>> Can you add libraryDependencies += "org.scalanlp" % "breeze_2.10" %
>> "0.11.2" also ?
>>
>>
>>
>> Devan M.S. | Technical Lead | Cyber Security | AMRITA VISHWA VIDYAPEETHAM
>> | Amritapuri | Cell +919946535290 |
>> [image: View DEVAN M S's profile on LinkedIn]
>> <https://in.linkedin.com/pub/devan-m-s/17/373/574>
>>
>>
>> On Wed, Oct 28, 2015 at 3:04 PM, Frederick Ayala <
>> frederickay...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am getting an error when adding flink-ml to the libraryDependencies on
>>> my build.sbt file:
>>>
>>> [error] (*:update) sbt.ResolveException: unresolved dependency:
>>> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found
>>>
>>> My libraryDependencies is:
>>>
>>> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" %
>>> "0.9.1", "org.apache.flink" % "flink-streaming-scala" % "0.9.1",
>>> "org.apache.flink" % "flink-clients" % "0.9.1",
>>> "org.apache.flink" % "flink-ml" % "0.9.1")
>>>
>>> I am using scalaVersion := "2.10.6"
>>>
>>> If I remove flink-ml all the other dependencies are resolved.
>>>
>>> Could you help me to figure out a solution for this?
>>>
>>> Thanks!
>>>
>>> Frederick Ayala
>>>
>>
>>
>
>
> --
> Frederick Ayala
>


Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Theodore Vasiloudis
This sounds similar to this problem:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-ML-as-Dependency-td1582.html

The reason is (quoting Till, replace gradle with sbt here):

the flink-ml pom contains as a dependency an artifact with artifactId
> breeze_${scala.binary.version}. The variable scala.binary.version is
> defined in the parent pom and not substituted when flink-ml is installed.
> Therefore gradle tries to find a dependency with the name
> breeze_${scala.binary.version}


Anwar's solution should work, I just tested it on a basic Flink build, but
I haven't tried running anything yet.
The resolution error does go away though. So your sbt should include
something like:

libraryDependencies ++= Seq(
  "org.apache.flink" % "flink-scala" % flinkVersion,
  "org.apache.flink" % "flink-clients" % flinkVersion,
  ("org.apache.flink" % "flink-ml" % flinkVersion)
.exclude("org.scalanlp", "breeze_${scala.binary.version}"),
  "org.scalanlp" %% "breeze" % "0.11.2")



On Wed, Oct 28, 2015 at 10:34 AM, Frederick Ayala <frederickay...@gmail.com>
wrote:

> Hi,
>
> I am getting an error when adding flink-ml to the libraryDependencies on
> my build.sbt file:
>
> [error] (*:update) sbt.ResolveException: unresolved dependency:
> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found
>
> My libraryDependencies is:
>
> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
> "org.apache.flink" % "flink-streaming-scala" % "0.9.1", "org.apache.flink"
> % "flink-clients" % "0.9.1",
> "org.apache.flink" % "flink-ml" % "0.9.1")
>
> I am using scalaVersion := "2.10.6"
>
> If I remove flink-ml all the other dependencies are resolved.
>
> Could you help me to figure out a solution for this?
>
> Thanks!
>
> Frederick Ayala
>


Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Frederick Ayala
Thank you Anwar! That did the trick :)

On Wed, Oct 28, 2015 at 1:30 PM, Anwar Rizal <anriza...@gmail.com> wrote:

> Yeah
>
> I had similar problems with kafka in spark streaming. I worked around the
> problem by excluding kafka from connector and then adding the library back.
>
> Maybe you can try something like:
>
> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
> "org.apache.flink" % "flink-clients" % "0.9.1" ,"org.apache.flink" %
> "flink-ml" % "0.9.1"  exclude("org.scalanlp",
> "breeze_${scala.binary.version}"))
>
> libraryDependencies += "org.scalanlp" % "breeze_2.10" % "0.11.2"
>
> Anwar.
>
>
>
>
> On Wed, Oct 28, 2015 at 11:29 AM, Frederick Ayala <
> frederickay...@gmail.com> wrote:
>
>> I tried adding libraryDependencies += "org.scalanlp" % "breeze_2.10" %
>> "0.11.2"  but the problem persist.
>>
>> I also tried as explained in the Breeze documentation:
>>
>> libraryDependencies  ++= Seq(
>>   "org.scalanlp" %% "breeze" % "0.11.2",
>>   "org.scalanlp" %% "breeze-natives" % "0.11.2",
>>   "org.scalanlp" %% "breeze-viz" % "0.11.2"
>> )
>>
>> resolvers ++= Seq("Sonatype Releases" at "
>> https://oss.sonatype.org/content/repositories/releases/;)
>>
>> But it doesn't work.
>>
>> The message is still "unresolved dependency:
>> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found"
>>
>> Could the problem be on flink-ml/pom.xml?
>>
>> 
>> org.scalanlp
>> breeze_${scala.binary.version}
>> 0.11.2
>> 
>>
>> The property scala.binary.version is not being replaced by the value 2.10
>>
>> Thanks,
>>
>> Frederick Ayala
>>
>> On Wed, Oct 28, 2015 at 10:59 AM, DEVAN M.S. <msdeva...@gmail.com> wrote:
>>
>>> Can you add libraryDependencies += "org.scalanlp" % "breeze_2.10" %
>>> "0.11.2" also ?
>>>
>>>
>>>
>>> Devan M.S. | Technical Lead | Cyber Security | AMRITA VISHWA
>>> VIDYAPEETHAM | Amritapuri | Cell +919946535290 |
>>> [image: View DEVAN M S's profile on LinkedIn]
>>> <https://in.linkedin.com/pub/devan-m-s/17/373/574>
>>>
>>>
>>> On Wed, Oct 28, 2015 at 3:04 PM, Frederick Ayala <
>>> frederickay...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am getting an error when adding flink-ml to the libraryDependencies
>>>> on my build.sbt file:
>>>>
>>>> [error] (*:update) sbt.ResolveException: unresolved dependency:
>>>> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found
>>>>
>>>> My libraryDependencies is:
>>>>
>>>> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" %
>>>> "0.9.1", "org.apache.flink" % "flink-streaming-scala" % "0.9.1",
>>>> "org.apache.flink" % "flink-clients" % "0.9.1",
>>>> "org.apache.flink" % "flink-ml" % "0.9.1")
>>>>
>>>> I am using scalaVersion := "2.10.6"
>>>>
>>>> If I remove flink-ml all the other dependencies are resolved.
>>>>
>>>> Could you help me to figure out a solution for this?
>>>>
>>>> Thanks!
>>>>
>>>> Frederick Ayala
>>>>
>>>
>>>
>>
>>
>> --
>> Frederick Ayala
>>
>
>


-- 
Frederick Ayala


Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Frederick Ayala
Thanks Theodore. I can confirm that Anwar solution worked. My build.sbt
looks like:

libraryDependencies  ++= Seq(
  "org.scalanlp" %% "breeze" % "0.11.2",
  "org.scalanlp" %% "breeze-natives" % "0.11.2",
  "org.scalanlp" %% "breeze-viz" % "0.11.2"
)

libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
"org.apache.flink" % "flink-streaming-scala" % "0.9.1",
"org.apache.flink" % "flink-clients" % "0.9.1",
  "org.apache.flink" % "flink-ml" % "0.9.1"
exclude("org.scalanlp", "breeze_${scala.binary.version}")
)

resolvers ++= Seq("Sonatype Releases" at "
https://oss.sonatype.org/content/repositories/releases/;)



On Wed, Oct 28, 2015 at 1:41 PM, Theodore Vasiloudis <
theodoros.vasilou...@gmail.com> wrote:

> This sounds similar to this problem:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-ML-as-Dependency-td1582.html
>
> The reason is (quoting Till, replace gradle with sbt here):
>
> the flink-ml pom contains as a dependency an artifact with artifactId
>> breeze_${scala.binary.version}. The variable scala.binary.version is
>> defined in the parent pom and not substituted when flink-ml is installed.
>> Therefore gradle tries to find a dependency with the name
>> breeze_${scala.binary.version}
>
>
> Anwar's solution should work, I just tested it on a basic Flink build, but
> I haven't tried running anything yet.
> The resolution error does go away though. So your sbt should include
> something like:
>
> libraryDependencies ++= Seq(
>   "org.apache.flink" % "flink-scala" % flinkVersion,
>   "org.apache.flink" % "flink-clients" % flinkVersion,
>   ("org.apache.flink" % "flink-ml" % flinkVersion)
> .exclude("org.scalanlp", "breeze_${scala.binary.version}"),
>   "org.scalanlp" %% "breeze" % "0.11.2")
>
>
>
> On Wed, Oct 28, 2015 at 10:34 AM, Frederick Ayala <
> frederickay...@gmail.com> wrote:
>
>> Hi,
>>
>> I am getting an error when adding flink-ml to the libraryDependencies on
>> my build.sbt file:
>>
>> [error] (*:update) sbt.ResolveException: unresolved dependency:
>> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found
>>
>> My libraryDependencies is:
>>
>> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
>> "org.apache.flink" % "flink-streaming-scala" % "0.9.1", "org.apache.flink"
>> % "flink-clients" % "0.9.1",
>> "org.apache.flink" % "flink-ml" % "0.9.1")
>>
>> I am using scalaVersion := "2.10.6"
>>
>> If I remove flink-ml all the other dependencies are resolved.
>>
>> Could you help me to figure out a solution for this?
>>
>> Thanks!
>>
>> Frederick Ayala
>>
>
>


-- 
Frederick Ayala


Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread DEVAN M.S.
Can you add libraryDependencies += "org.scalanlp" % "breeze_2.10" %
"0.11.2" also ?



Devan M.S. | Technical Lead | Cyber Security | AMRITA VISHWA VIDYAPEETHAM |
Amritapuri | Cell +919946535290 |
[image: View DEVAN M S's profile on LinkedIn]
<https://in.linkedin.com/pub/devan-m-s/17/373/574>


On Wed, Oct 28, 2015 at 3:04 PM, Frederick Ayala <frederickay...@gmail.com>
wrote:

> Hi,
>
> I am getting an error when adding flink-ml to the libraryDependencies on
> my build.sbt file:
>
> [error] (*:update) sbt.ResolveException: unresolved dependency:
> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found
>
> My libraryDependencies is:
>
> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
> "org.apache.flink" % "flink-streaming-scala" % "0.9.1", "org.apache.flink"
> % "flink-clients" % "0.9.1",
> "org.apache.flink" % "flink-ml" % "0.9.1")
>
> I am using scalaVersion := "2.10.6"
>
> If I remove flink-ml all the other dependencies are resolved.
>
> Could you help me to figure out a solution for this?
>
> Thanks!
>
> Frederick Ayala
>


Fwd: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Frederick Ayala
Hi,

I am getting an error when adding flink-ml to the libraryDependencies on my
build.sbt file:

[error] (*:update) sbt.ResolveException: unresolved dependency:
org.scalanlp#breeze_${scala.binary.version};0.11.2: not found

My libraryDependencies is:

libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
"org.apache.flink" % "flink-streaming-scala" % "0.9.1", "org.apache.flink"
% "flink-clients" % "0.9.1",
"org.apache.flink" % "flink-ml" % "0.9.1")

I am using scalaVersion := "2.10.6"

If I remove flink-ml all the other dependencies are resolved.

Could you help me to figure out a solution for this?

Thanks!

Frederick Ayala


Re: Flink-ml multiple linear regression fit

2015-09-20 Thread Florian Heyl
Hi Stephan,

Yeah I forgot the breeze library. Thanks. Unfortunately there is still another 
problem when I am running the pipeline on the hdfs. 
I tried to figure out what the cause of the problem is and I am mainly stuck at 
the collect method for the datasets. 
// List( (1.0, 1.0), (2.0, 2.0), ... (1.0,1.0) )
val list_JoinPredictionAndOriginal = JoinPredictionAndOriginal.collect
The line causes the errors (see below). Maybe I am still missing some 
libraries. The jar is packed now with the breeze, netlib, flink-ml, flink-core 
kryo and minlog libraries.
Thank you for any help and your time.

Best wishes,
Flo

Error: java.lang.NoClassDefFoundError: org/netlib/blas/Ddot
at com.github.fommil.netlib.F2jBLAS.ddot(F2jBLAS.java:71)
at org.apache.flink.ml.math.BLAS$.dot(BLAS.scala:123)
at org.apache.flink.ml.math.BLAS$.dot(BLAS.scala:106)
at 
org.apache.flink.ml.optimization.LinearPrediction$.predict(PredictionFunction.scala:34)
at 
org.apache.flink.ml.optimization.GenericLossFunction.lossGradient(LossFunction.scala:83)
at 
org.apache.flink.ml.optimization.LossFunction$class.loss(LossFunction.scala:43)
at 
org.apache.flink.ml.optimization.GenericLossFunction.loss(LossFunction.scala:71)
at 
org.apache.flink.ml.optimization.GradientDescent$$anonfun$org$apache$flink$ml$optimization$GradientDescent$$calculateLoss$1.apply(GradientDescent.scala:237)
at 
org.apache.flink.ml.optimization.GradientDescent$$anonfun$org$apache$flink$ml$optimization$GradientDescent$$calculateLoss$1.apply(GradientDescent.scala:237)
at 
org.apache.flink.ml.package$BroadcastSingleElementMapper.map(package.scala:86)
at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97)
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:489)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.netlib.blas.Ddot
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more

Error: java.io.IOException: Materialization of the broadcast variable failed.
at 
org.apache.flink.runtime.broadcast.BroadcastVariableMaterialization.materializeVariable(BroadcastVariableMaterialization.java:154)
at 
org.apache.flink.runtime.broadcast.BroadcastVariableManager.materializeBroadcastVariable(BroadcastVariableManager.java:50)
at 
org.apache.flink.runtime.operators.RegularPactTask.readAndSetBroadcastInput(RegularPactTask.java:432)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:350)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.runtime.io.network.partition.ProducerFailedException
at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextLookAhead(LocalInputChannel.java:270)
at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.onNotification(LocalInputChannel.java:238)
at 
org.apache.flink.runtime.io.network.partition.PipelinedSubpartition.release(PipelinedSubpartition.java:158)
at 
org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:300)
at 
org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartitionsProducedBy(ResultPartitionManager.java:95)
at 
org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:357)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:679)
... 1 more



Am 20.09.2015 um 02:01 schrieb Stephan Ewen <se...@apache.org>:

> Hi!
> 
> Looks like you submitted the program JAR, but it did not contain all required 
> libraries, like the breeze JAR.
> 
> Did you build a proper fat jar, or how did you package the program?
> 
> Greetings,
> Stephan
> 
> On Fri, Sep 18, 2015 at 8:22 PM, Florian Heyl <f.h...@gmx.de> wrote:
> Hey Guys need your help again,
> I am currently having problems with the multiple linear regression from the 
> flink-ml on the HDFS. 
> Locally it works fine with the 0.9-SNAPSHOT. The cluster runs with the 
> 0.10-SNAPSHOT. The code is the following:
> // set linear regression with parameters:
> val mlr = MultipleLinearRegression()
> .setStepsize(0.001)
> .setIterations(10)
> .setConvergenceThreshold(0.001)
> 
> // do linear regression and time the method
> val model = mlr.fit(transformTrain)
> 
> // The fitted model can now be used to make predictions
> val predictions = mlr.predict(tranformTest)
> The dataset transformTrain has the fo

Re: Flink-ml multiple linear regression fit

2015-09-20 Thread Stephan Ewen
You are again missing a library.

There seems so be something quite complicated about your build setup.

I would go for the ML quickstart or Maven template, which will package a
correct fat jar automatically.



On Sun, Sep 20, 2015 at 2:15 PM, Florian Heyl <f.h...@gmx.de> wrote:

> Hi Stephan,
>
> Yeah I forgot the breeze library. Thanks. Unfortunately there is still
> another problem when I am running the pipeline on the hdfs.
> I tried to figure out what the cause of the problem is and I am mainly
> stuck at the collect method for the datasets.
>
> // List( (1.0, 1.0), (2.0, 2.0), ... (1.0,1.0) )
> val list_JoinPredictionAndOriginal = JoinPredictionAndOriginal.collect
>
> The line causes the errors (see below). Maybe I am still missing some
> libraries. The jar is packed now with the breeze, netlib, flink-ml,
> flink-core kryo and minlog libraries.
> Thank you for any help and your time.
>
> Best wishes,
> Flo
>
> Error: java.lang.NoClassDefFoundError: org/netlib/blas/Ddot
> at com.github.fommil.netlib.F2jBLAS.ddot(F2jBLAS.java:71)
> at org.apache.flink.ml.math.BLAS$.dot(BLAS.scala:123)
> at org.apache.flink.ml.math.BLAS$.dot(BLAS.scala:106)
> at
> org.apache.flink.ml.optimization.LinearPrediction$.predict(PredictionFunction.scala:34)
> at
> org.apache.flink.ml.optimization.GenericLossFunction.lossGradient(LossFunction.scala:83)
> at
> org.apache.flink.ml.optimization.LossFunction$class.loss(LossFunction.scala:43)
> at
> org.apache.flink.ml.optimization.GenericLossFunction.loss(LossFunction.scala:71)
> at
> org.apache.flink.ml.optimization.GradientDescent$$anonfun$org$apache$flink$ml$optimization$GradientDescent$$calculateLoss$1.apply(GradientDescent.scala:237)
> at
> org.apache.flink.ml.optimization.GradientDescent$$anonfun$org$apache$flink$ml$optimization$GradientDescent$$calculateLoss$1.apply(GradientDescent.scala:237)
> at
> org.apache.flink.ml.package$BroadcastSingleElementMapper.map(package.scala:86)
> at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97)
> at
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:489)
> at
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: org.netlib.blas.Ddot
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
>
> Error: java.io.IOException: Materialization of the broadcast variable
> failed.
> at
> org.apache.flink.runtime.broadcast.BroadcastVariableMaterialization.materializeVariable(BroadcastVariableMaterialization.java:154)
> at
> org.apache.flink.runtime.broadcast.BroadcastVariableManager.materializeBroadcastVariable(BroadcastVariableManager.java:50)
> at
> org.apache.flink.runtime.operators.RegularPactTask.readAndSetBroadcastInput(RegularPactTask.java:432)
> at
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:350)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
> at java.lang.Thread.run(Thread.java:745)
> Caused by:
> org.apache.flink.runtime.io.network.partition.ProducerFailedException
> at
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextLookAhead(LocalInputChannel.java:270)
> at
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.onNotification(LocalInputChannel.java:238)
> at
> org.apache.flink.runtime.io.network.partition.PipelinedSubpartition.release(PipelinedSubpartition.java:158)
> at
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:300)
> at
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartitionsProducedBy(ResultPartitionManager.java:95)
> at
> org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:357)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:679)
> ... 1 more
>
>
>
> Am 20.09.2015 um 02:01 schrieb Stephan Ewen <se...@apache.org>:
>
> Hi!
>
> Looks like you submitted the program JAR, but it did not contain all
> required libraries, like the breeze JAR.
>
> Did you build a proper fat jar, or how did you package the program?
>
> Greetings,
> Stephan
>
> On Fri, Sep 18, 2015 at 8:22 PM, Florian Heyl <f.h...@gmx.de> wrote:
>
>> Hey Guys need your help 

Re: Flink-ml multiple linear regression fit

2015-09-19 Thread Stephan Ewen
Hi!

Looks like you submitted the program JAR, but it did not contain all
required libraries, like the breeze JAR.

Did you build a proper fat jar, or how did you package the program?

Greetings,
Stephan

On Fri, Sep 18, 2015 at 8:22 PM, Florian Heyl <f.h...@gmx.de> wrote:

> Hey Guys need your help again,
> I am currently having problems with the multiple linear regression from
> the flink-ml on the HDFS.
> Locally it works fine with the 0.9-SNAPSHOT. The cluster runs with the
> 0.10-SNAPSHOT. The code is the following:
>
> // set linear regression with parameters:
> val mlr = MultipleLinearRegression()
> .setStepsize(0.001)
> .setIterations(10)
> .setConvergenceThreshold(0.001)
>
> // do linear regression and time the method
> val model = mlr.fit(transformTrain)
>
> // The fitted model can now be used to make predictions
> val predictions = mlr.predict(tranformTest)
>
> The dataset transformTrain has the following form (filled with doubles):
>
> LabeledVector(numList(0), DenseVector(numList(1),numList(2)))
>
> Mainly the line where the fit method (mlr.fit) is called causes the
> following error:
>
> An error occurred while invoking the program:
>
>
> The program caused an error:
>
> java.lang.NoClassDefFoundError: breeze/storage/Zero
>   at org.apache.flink.ml.pipeline.Estimator$class.fit(Estimator.scala:53)
>   at 
> org.apache.flink.ml.regression.MultipleLinearRegression.fit(MultipleLinearRegression.scala:88)
>   at Regression2$.buildModelRegression(Regression2.scala:37)
>   at 
> Regression2$$anonfun$mainRegression$1.apply$mcVI$sp(Regression2.scala:116)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at Regression2$.mainRegression(Regression2.scala:103)
>   at MainClass$.main(MainClass.scala:47)
>   at MainClass.main(MainClass.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>   at 
> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:192)
>   at org.apache.flink.client.CliFrontend.info(CliFrontend.java:399)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:959)
>   at 
> org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:174)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
>   at org.eclipse.jetty.server.Server.handle(Server.java:352)
>   at 
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
>   at 
> org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
>   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
>   at 
> org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
>   at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>   at java.

Flink-ml multiple linear regression fit

2015-09-18 Thread Florian Heyl
Hey Guys need your help again,
I am currently having problems with the multiple linear regression from the 
flink-ml on the HDFS. 
Locally it works fine with the 0.9-SNAPSHOT. The cluster runs with the 
0.10-SNAPSHOT. The code is the following:
// set linear regression with parameters:
val mlr = MultipleLinearRegression()
.setStepsize(0.001)
.setIterations(10)
.setConvergenceThreshold(0.001)

// do linear regression and time the method
val model = mlr.fit(transformTrain)

// The fitted model can now be used to make predictions
val predictions = mlr.predict(tranformTest)
The dataset transformTrain has the following form (filled with doubles):
LabeledVector(numList(0), DenseVector(numList(1),numList(2)))
Mainly the line where the fit method (mlr.fit) is called causes the following 
error:

An error occurred while invoking the program:

The program caused an error: 

java.lang.NoClassDefFoundError: breeze/storage/Zero
at org.apache.flink.ml.pipeline.Estimator$class.fit(Estimator.scala:53)
at 
org.apache.flink.ml.regression.MultipleLinearRegression.fit(MultipleLinearRegression.scala:88)
at Regression2$.buildModelRegression(Regression2.scala:37)
at 
Regression2$$anonfun$mainRegression$1.apply$mcVI$sp(Regression2.scala:116)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at Regression2$.mainRegression(Regression2.scala:103)
at MainClass$.main(MainClass.scala:47)
at MainClass.main(MainClass.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at 
org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:192)
at org.apache.flink.client.CliFrontend.info(CliFrontend.java:399)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:959)
at 
org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:174)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
at 
org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 39 more
Thanks for any help.

Best wishes,
Flo

Re: Flink-ML as Dependency

2015-06-11 Thread Till Rohrmann
Hi Max,

I just tested a build using gradle (with your build.gradle file) and some
flink-ml algorithms. And it was completed without the problem of the
unresolved breeze dependency.

I use the version 2.2.1 of Gradle. Which version are you using?

Since you’re using Flink’s snapshots and have specified only the local
maven repository, can you re-install flink again and check whether the
error still occurs? Simple call mvn clean install -DskipTests
-Dmaven.javadoc.skip=true from the root directory of Flink.

Cheers,
Till

On Wed, Jun 10, 2015 at 3:38 PM Maximilian Alber alber.maximil...@gmail.com
http://mailto:alber.maximil...@gmail.com wrote:

Hi Flinksters,

 I would like to test FlinkML. Unfortunately, I get an error when including
 it to my project. It might be my error as I'm not experienced with Gradle,
 but with Google I got any wiser.

 My build.gradle looks as follows:

 apply plugin: 'java'
 apply plugin: 'scala'

 //sourceCompatibility = 1.5
 version = '1.0'
 jar {
 manifest {
 attributes 'Implementation-Title': 'Test Project',
'Implementation-Version': 1.0
 }
 }

 repositories {
   mavenCentral()
   mavenLocal()
 }

 dependencies {
   compile 'org.scala-lang:scala-library:2.10.5'
   compile 'org.scala-lang:scala-compiler:2.10.5'

   compile 'org.scalanlp:breeze_2.10:0.11.2'

   compile group: 'org.apache.flink', name: 'flink-clients', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-scala', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-ml', version:
 '0.9-SNAPSHOT'
 }

 And I get the following error:

 alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle
 compileScala
 Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar

 FAILURE: Build failed with an exception.

 * What went wrong:
 Could not resolve all dependencies for configuration ':compile'.
  Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
   Required by:
   :test:1.0  org.apache.flink:flink-ml:0.9-SNAPSHOT
 Illegal character in path at index 51:
 http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom

 * Try:
 Run with --stacktrace option to get the stack trace. Run with --info or
 --debug option to get more log output.

 BUILD FAILED

 Total time: 7.113 secs


 I'm thankful for any ideas!

 Cheers,
 Max

​


Re: Flink-ML as Dependency

2015-06-11 Thread Till Rohrmann
Hmm then I assume that version 2 can properly handle maven property
variables.

On Thu, Jun 11, 2015 at 3:05 PM Maximilian Alber alber.maximil...@gmail.com
wrote:

 Hi Till,

 I use the standard one for Ubuntu 15.04, which is 1.5.

 That did not make any difference.

 Thanks and Cheers,
 Max

 On Thu, Jun 11, 2015 at 11:22 AM, Till Rohrmann trohrm...@apache.org
 wrote:

 Hi Max,

 I just tested a build using gradle (with your build.gradle file) and some
 flink-ml algorithms. And it was completed without the problem of the
 unresolved breeze dependency.

 I use the version 2.2.1 of Gradle. Which version are you using?

 Since you’re using Flink’s snapshots and have specified only the local
 maven repository, can you re-install flink again and check whether the
 error still occurs? Simple call mvn clean install -DskipTests
 -Dmaven.javadoc.skip=true from the root directory of Flink.

 Cheers,
 Till

 On Wed, Jun 10, 2015 at 3:38 PM Maximilian Alber
 alber.maximil...@gmail.com http://mailto:alber.maximil...@gmail.com
 wrote:

 Hi Flinksters,

 I would like to test FlinkML. Unfortunately, I get an error when
 including it to my project. It might be my error as I'm not experienced
 with Gradle, but with Google I got any wiser.

 My build.gradle looks as follows:

 apply plugin: 'java'
 apply plugin: 'scala'

 //sourceCompatibility = 1.5
 version = '1.0'
 jar {
 manifest {
 attributes 'Implementation-Title': 'Test Project',
'Implementation-Version': 1.0
 }
 }

 repositories {
   mavenCentral()
   mavenLocal()
 }

 dependencies {
   compile 'org.scala-lang:scala-library:2.10.5'
   compile 'org.scala-lang:scala-compiler:2.10.5'

   compile 'org.scalanlp:breeze_2.10:0.11.2'

   compile group: 'org.apache.flink', name: 'flink-clients', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-scala', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-ml', version:
 '0.9-SNAPSHOT'
 }

 And I get the following error:

 alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle
 compileScala
 Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar

 FAILURE: Build failed with an exception.

 * What went wrong:
 Could not resolve all dependencies for configuration ':compile'.
  Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
   Required by:
   :test:1.0  org.apache.flink:flink-ml:0.9-SNAPSHOT
 Illegal character in path at index 51:
 http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom

 * Try:
 Run with --stacktrace option to get the stack trace. Run with --info or
 --debug option to get more log output.

 BUILD FAILED

 Total time: 7.113 secs


 I'm thankful for any ideas!

 Cheers,
 Max

 ​





Re: Flink-ML as Dependency

2015-06-11 Thread Maximilian Alber
Well then, I should update ;-)

On Thu, Jun 11, 2015 at 4:01 PM, Till Rohrmann till.rohrm...@gmail.com
wrote:

 Hmm then I assume that version 2 can properly handle maven property
 variables.


 On Thu, Jun 11, 2015 at 3:05 PM Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Till,

 I use the standard one for Ubuntu 15.04, which is 1.5.

 That did not make any difference.

 Thanks and Cheers,
 Max

 On Thu, Jun 11, 2015 at 11:22 AM, Till Rohrmann trohrm...@apache.org
 wrote:

 Hi Max,

 I just tested a build using gradle (with your build.gradle file) and
 some flink-ml algorithms. And it was completed without the problem of the
 unresolved breeze dependency.

 I use the version 2.2.1 of Gradle. Which version are you using?

 Since you’re using Flink’s snapshots and have specified only the local
 maven repository, can you re-install flink again and check whether the
 error still occurs? Simple call mvn clean install -DskipTests
 -Dmaven.javadoc.skip=true from the root directory of Flink.

 Cheers,
 Till

 On Wed, Jun 10, 2015 at 3:38 PM Maximilian Alber
 alber.maximil...@gmail.com http://mailto:alber.maximil...@gmail.com
 wrote:

 Hi Flinksters,

 I would like to test FlinkML. Unfortunately, I get an error when
 including it to my project. It might be my error as I'm not experienced
 with Gradle, but with Google I got any wiser.

 My build.gradle looks as follows:

 apply plugin: 'java'
 apply plugin: 'scala'

 //sourceCompatibility = 1.5
 version = '1.0'
 jar {
 manifest {
 attributes 'Implementation-Title': 'Test Project',
'Implementation-Version': 1.0
 }
 }

 repositories {
   mavenCentral()
   mavenLocal()
 }

 dependencies {
   compile 'org.scala-lang:scala-library:2.10.5'
   compile 'org.scala-lang:scala-compiler:2.10.5'

   compile 'org.scalanlp:breeze_2.10:0.11.2'

   compile group: 'org.apache.flink', name: 'flink-clients', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-scala', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-ml', version:
 '0.9-SNAPSHOT'
 }

 And I get the following error:

 alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle
 compileScala
 Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar

 FAILURE: Build failed with an exception.

 * What went wrong:
 Could not resolve all dependencies for configuration ':compile'.
  Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
   Required by:
   :test:1.0  org.apache.flink:flink-ml:0.9-SNAPSHOT
 Illegal character in path at index 51:
 http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom

 * Try:
 Run with --stacktrace option to get the stack trace. Run with --info or
 --debug option to get more log output.

 BUILD FAILED

 Total time: 7.113 secs


 I'm thankful for any ideas!

 Cheers,
 Max

 ​





Re: Flink-ML as Dependency

2015-06-11 Thread Maximilian Alber
Hi Till,

Thanks for the quick help!

Cheers,
Max

On Wed, Jun 10, 2015 at 5:50 PM, Till Rohrmann till.rohrm...@gmail.com
wrote:

 Hi Max,

 I think the reason is that the flink-ml pom contains as a dependency an
 artifact with artifactId breeze_${scala.binary.version}. The variable
 scala.binary.version is defined in the parent pom and not substituted
 when flink-ml is installed. Therefore gradle tries to find a dependency
 with the name breeze_${scala.binary.version}.

 I try to find a solution for this problem. As a quick work around you
 should be able to define the variable manually and set it to 2.10.

 Cheers,
 Till
 ​

 On Wed, Jun 10, 2015 at 3:38 PM Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Flinksters,

 I would like to test FlinkML. Unfortunately, I get an error when
 including it to my project. It might be my error as I'm not experienced
 with Gradle, but with Google I got any wiser.

 My build.gradle looks as follows:

 apply plugin: 'java'
 apply plugin: 'scala'

 //sourceCompatibility = 1.5
 version = '1.0'
 jar {
 manifest {
 attributes 'Implementation-Title': 'Test Project',
'Implementation-Version': 1.0
 }
 }

 repositories {
   mavenCentral()
   mavenLocal()
 }

 dependencies {
   compile 'org.scala-lang:scala-library:2.10.5'
   compile 'org.scala-lang:scala-compiler:2.10.5'

   compile 'org.scalanlp:breeze_2.10:0.11.2'

   compile group: 'org.apache.flink', name: 'flink-clients', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-scala', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-ml', version:
 '0.9-SNAPSHOT'
 }

 And I get the following error:

 alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle
 compileScala
 Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar

 FAILURE: Build failed with an exception.

 * What went wrong:
 Could not resolve all dependencies for configuration ':compile'.
  Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
   Required by:
   :test:1.0  org.apache.flink:flink-ml:0.9-SNAPSHOT
 Illegal character in path at index 51:
 http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom

 * Try:
 Run with --stacktrace option to get the stack trace. Run with --info or
 --debug option to get more log output.

 BUILD FAILED

 Total time: 7.113 secs


 I'm thankful for any ideas!

 Cheers,
 Max




Re: Flink-ML as Dependency

2015-06-10 Thread Till Rohrmann
Hi Max,

I think the reason is that the flink-ml pom contains as a dependency an
artifact with artifactId breeze_${scala.binary.version}. The variable
scala.binary.version is defined in the parent pom and not substituted when
flink-ml is installed. Therefore gradle tries to find a dependency with the
name breeze_${scala.binary.version}.

I try to find a solution for this problem. As a quick work around you
should be able to define the variable manually and set it to 2.10.

Cheers,
Till
​

On Wed, Jun 10, 2015 at 3:38 PM Maximilian Alber alber.maximil...@gmail.com
wrote:

 Hi Flinksters,

 I would like to test FlinkML. Unfortunately, I get an error when including
 it to my project. It might be my error as I'm not experienced with Gradle,
 but with Google I got any wiser.

 My build.gradle looks as follows:

 apply plugin: 'java'
 apply plugin: 'scala'

 //sourceCompatibility = 1.5
 version = '1.0'
 jar {
 manifest {
 attributes 'Implementation-Title': 'Test Project',
'Implementation-Version': 1.0
 }
 }

 repositories {
   mavenCentral()
   mavenLocal()
 }

 dependencies {
   compile 'org.scala-lang:scala-library:2.10.5'
   compile 'org.scala-lang:scala-compiler:2.10.5'

   compile 'org.scalanlp:breeze_2.10:0.11.2'

   compile group: 'org.apache.flink', name: 'flink-clients', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-scala', version:
 '0.9-SNAPSHOT'
   compile group: 'org.apache.flink', name: 'flink-ml', version:
 '0.9-SNAPSHOT'
 }

 And I get the following error:

 alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle
 compileScala
 Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar

 FAILURE: Build failed with an exception.

 * What went wrong:
 Could not resolve all dependencies for configuration ':compile'.
  Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
   Required by:
   :test:1.0  org.apache.flink:flink-ml:0.9-SNAPSHOT
 Illegal character in path at index 51:
 http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom

 * Try:
 Run with --stacktrace option to get the stack trace. Run with --info or
 --debug option to get more log output.

 BUILD FAILED

 Total time: 7.113 secs


 I'm thankful for any ideas!

 Cheers,
 Max



Re: flink ml - k-means

2015-05-13 Thread Pa Rö
okay :)

now i use the following exsample code from here:
https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java

2015-05-11 21:56 GMT+02:00 Stephan Ewen se...@apache.org:

 Paul!

 Can you use the KMeans example? The code is for three-dimensional points,
 but you should be able to generalize it easily.
 That would be the fastest way to go. without waiting for any release
 dates...

 Stephan


 On Mon, May 11, 2015 at 2:46 PM, Pa Rö paul.roewer1...@googlemail.com
 wrote:

 hi,

 now i want implement kmeans with flink,
 maybe you know a release date for flink ml kmeans?

 best regards
 paul

 2015-04-27 9:36 GMT+02:00 Pa Rö paul.roewer1...@googlemail.com:

 Hi Alexander and Till,

 thanks for your informations, I look forward to the release.
 I'm curious how well is flink ml against mahout und spark ml.

 best regerds
 Paul

 2015-04-27 9:23 GMT+02:00 Till Rohrmann trohrm...@apache.org:

 Hi Paul,

 if you can't wait, a vanilla implementation is already contained as
 part of the Flink examples. You should find it under flink/flink-examples.

 But we will try to add more clustering algorithms in the near future.

 Cheers,
 Till
 On Apr 26, 2015 11:14 PM, Alexander Alexandrov 
 alexander.s.alexand...@gmail.com wrote:

 Yes, I expect to have one in the next few weeks (the code is actually
 there, but we need to port it to the Flink ML API). I suggest to follow 
 the
 JIRA issue in the next weeks to check when this is done:

 https://issues.apache.org/jira/browse/FLINK-1731

 Regards,
 Alexander

 PS. Bear in mind that we will start with a vanilla implementation of
 K-Means. For a thorough evaluation you might want to also check variants
 like K-Means++.


 2015-04-24 15:08 GMT+02:00 Pa Rö paul.roewer1...@googlemail.com:

 hi flink community,

 at the time I write my master thesis in the field machine learning.
 My main task is to evaluated different k-means variants for large data 
 sets
 (BigData). I would like test flink ml against Apache Mahout and Apache
 Hadoop MapReduce in areas of scalability and performance(time and space).
 What is the current state for the purpose of clustering, especially
 K-Means? Will there be in the near future a release information this?

 best greetings
 paul








Re: flink ml k means relase

2015-05-11 Thread Robert Metzger
Hi,

the community didn't decide on a plan for releasing Flink 0.9 yet.
Here, you can track the progress for the Flink ML variant of KMeans:
https://issues.apache.org/jira/browse/FLINK-1731

There is also a KMeans implementation in the examples of Flink. Maybe that
is sufficient for now?

--Robert


On Mon, May 11, 2015 at 2:50 PM, Pa Rö paul.roewer1...@googlemail.com
wrote:

 hi,

 now i want implement kmeans with flink,
 maybe you know a release date for flink ml kmeans?

 best regards
 paul



Re: flink ml - k-means

2015-05-11 Thread Stephan Ewen
Paul!

Can you use the KMeans example? The code is for three-dimensional points,
but you should be able to generalize it easily.
That would be the fastest way to go. without waiting for any release
dates...

Stephan


On Mon, May 11, 2015 at 2:46 PM, Pa Rö paul.roewer1...@googlemail.com
wrote:

 hi,

 now i want implement kmeans with flink,
 maybe you know a release date for flink ml kmeans?

 best regards
 paul

 2015-04-27 9:36 GMT+02:00 Pa Rö paul.roewer1...@googlemail.com:

 Hi Alexander and Till,

 thanks for your informations, I look forward to the release.
 I'm curious how well is flink ml against mahout und spark ml.

 best regerds
 Paul

 2015-04-27 9:23 GMT+02:00 Till Rohrmann trohrm...@apache.org:

 Hi Paul,

 if you can't wait, a vanilla implementation is already contained as part
 of the Flink examples. You should find it under flink/flink-examples.

 But we will try to add more clustering algorithms in the near future.

 Cheers,
 Till
 On Apr 26, 2015 11:14 PM, Alexander Alexandrov 
 alexander.s.alexand...@gmail.com wrote:

 Yes, I expect to have one in the next few weeks (the code is actually
 there, but we need to port it to the Flink ML API). I suggest to follow the
 JIRA issue in the next weeks to check when this is done:

 https://issues.apache.org/jira/browse/FLINK-1731

 Regards,
 Alexander

 PS. Bear in mind that we will start with a vanilla implementation of
 K-Means. For a thorough evaluation you might want to also check variants
 like K-Means++.


 2015-04-24 15:08 GMT+02:00 Pa Rö paul.roewer1...@googlemail.com:

 hi flink community,

 at the time I write my master thesis in the field machine learning. My
 main task is to evaluated different k-means variants for large data sets
 (BigData). I would like test flink ml against Apache Mahout and Apache
 Hadoop MapReduce in areas of scalability and performance(time and space).
 What is the current state for the purpose of clustering, especially
 K-Means? Will there be in the near future a release information this?

 best greetings
 paul







  1   2   >