Re: Help with Flink experimental Table API

2015-06-10 Thread Shiti Saxena
Hi Aljoscha,

Could you please point me to the JIRA tickets? If you could provide some
guidance on how to resolve these, I will work on them and raise a
pull-request.

Thanks,
Shiti

On Thu, Jun 11, 2015 at 11:31 AM, Aljoscha Krettek 
wrote:

> Hi,
> yes, I think the problem is that the RowSerializer does not support
> null-values. I think we can add support for this, I will open a Jira issue.
>
> Another problem I then see is that the aggregations can not properly deal
> with null-values. This would need separate support.
>
> Regards,
> Aljoscha
>
> On Thu, 11 Jun 2015 at 06:41 Shiti Saxena  wrote:
>
>> Hi,
>>
>> In our project, we are using the Flink Table API and are facing the
>> following issues,
>>
>> We load data from a CSV file and create a DataSet[Row]. The CSV file can
>> also have invalid entries in some of the fields which we replace with null
>> when building the DataSet[Row].
>>
>> This DataSet[Row] is later on transformed to Table whenever required and
>> specific operation such as select or aggregate, etc are performed.
>>
>> When a null value is encountered, we get a null pointer exception and the
>> whole job fails. (We can see this by calling collect on the resulting
>> DataSet).
>>
>> The error message is similar to,
>>
>> Job execution failed.
>> org.apache.flink.runtime.client.JobExecutionException: Job execution
>> failed.
>> at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:315)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>> at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43)
>> at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>> at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> Caused by: java.lang.NullPointerException
>> at
>> org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:63)
>> at
>> org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:27)
>> at
>> org.apache.flink.api.table.typeinfo.RowSerializer.serialize(RowSerializer.scala:80)
>> at
>> org.apache.flink.api.table.typeinfo.RowSerializer.serialize(RowSerializer.scala:28)
>> at
>> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
>> at
>> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
>> at
>> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:83)
>> at
>> org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:177)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>> at java.lang.Thread.run(Thread.java:724)
>>
>> Could this be because the RowSerializer does not support null values?
>> (Similar to Flink-629  )
>>
>> Currently, to overcome this issue, we are ignoring all the rows which may
>> have null values. For example, we have a method cleanData defined as,
>>
>> def cleanData(table:Table, relevantColumns:Seq[String]):Table = {
>> val whereClause: String = relevantColumns.map{
>> cName=>
>> s"$cName.isNotNull"
>> }.mkString(" && ")
>>
>> val result :Table =
>> table.select(relevantColumns.mkString(",")).where(whereClause)
>> result
>> }
>>
>> Before operating on any Table, we use this method and then continue with
>> task.
>>
>> Is this the right way to handle this? If not please let me know how to go
>> about it.
>>
>>
>>

Re: Best wishes for Kostas Tzoumas and Robert Metzger

2015-06-10 Thread Hawin Jiang
Hi Robert

Congrats for your presentation. I have downloaded your slides.
Hopefully Flink can move forward quickly.



Best regards
Hawin

On Wed, Jun 10, 2015 at 10:14 PM, Robert Metzger 
wrote:

> Hi Hawin,
>
> here are the slides:
> http://www.slideshare.net/robertmetzger1/apache-flink-deepdive-hadoop-summit-2015-in-san-jose-ca
> Thank you for the wishes. The talk was very well received.
>
> On Wed, Jun 10, 2015 at 10:41 AM, Hawin Jiang 
> wrote:
>
>> Hi  Michels
>>
>> I don't think you can watch them online now.
>>
>> Can someone share their presentations or feedback to us?
>> Thanks
>>
>>
>>
>> Best regards
>> Hawin
>>
>> On Mon, Jun 8, 2015 at 2:34 AM, Maximilian Michels 
>> wrote:
>>
>>> Thank you for your kind wishes :) Good luck from me as well!
>>>
>>> I was just wondering, is it possible to stream the talks or watch them
>>> later on?
>>>
>>> On Mon, Jun 8, 2015 at 2:54 AM, Hawin Jiang 
>>> wrote:
>>>
 Hi All



 As you know that Kostas Tzoumas and Robert Metzger will give us two
 Flink talks on 2015 Hadoop summit.

 That is an excellent opportunity to introduce Apache Flink to the
 world.

 Best wishes for Kostas Tzoumas and Robert Metzger.





 Here is the details info:



 Topic: Apache Flink deep-dive

 Time: 1:45pm - 2:25pm 2015/06/10

 Speakers: Kostas Tzoumas and Robert Metzger



 Topic: Flexible and Real-time Stream Processing with Apache Flink

 Time: 3:10pm - 3:50pm 2015/06/11

 Speakers: Kostas Tzoumas and Robert Metzger











 Best regards

 Hawin

>>>
>>>
>>
>


Re: Help with Flink experimental Table API

2015-06-10 Thread Aljoscha Krettek
Hi,
yes, I think the problem is that the RowSerializer does not support
null-values. I think we can add support for this, I will open a Jira issue.

Another problem I then see is that the aggregations can not properly deal
with null-values. This would need separate support.

Regards,
Aljoscha

On Thu, 11 Jun 2015 at 06:41 Shiti Saxena  wrote:

> Hi,
>
> In our project, we are using the Flink Table API and are facing the
> following issues,
>
> We load data from a CSV file and create a DataSet[Row]. The CSV file can
> also have invalid entries in some of the fields which we replace with null
> when building the DataSet[Row].
>
> This DataSet[Row] is later on transformed to Table whenever required and
> specific operation such as select or aggregate, etc are performed.
>
> When a null value is encountered, we get a null pointer exception and the
> whole job fails. (We can see this by calling collect on the resulting
> DataSet).
>
> The error message is similar to,
>
> Job execution failed.
> org.apache.flink.runtime.client.JobExecutionException: Job execution
> failed.
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:315)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at
> org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:63)
> at
> org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:27)
> at
> org.apache.flink.api.table.typeinfo.RowSerializer.serialize(RowSerializer.scala:80)
> at
> org.apache.flink.api.table.typeinfo.RowSerializer.serialize(RowSerializer.scala:28)
> at
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
> at
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
> at
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:83)
> at
> org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
> at
> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
> at
> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
> at
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:177)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
> at java.lang.Thread.run(Thread.java:724)
>
> Could this be because the RowSerializer does not support null values?
> (Similar to Flink-629  )
>
> Currently, to overcome this issue, we are ignoring all the rows which may
> have null values. For example, we have a method cleanData defined as,
>
> def cleanData(table:Table, relevantColumns:Seq[String]):Table = {
> val whereClause: String = relevantColumns.map{
> cName=>
> s"$cName.isNotNull"
> }.mkString(" && ")
>
> val result :Table =
> table.select(relevantColumns.mkString(",")).where(whereClause)
> result
> }
>
> Before operating on any Table, we use this method and then continue with
> task.
>
> Is this the right way to handle this? If not please let me know how to go
> about it.
>
>
> Thanks,
> Shiti
>
>
>
>


Re: Best wishes for Kostas Tzoumas and Robert Metzger

2015-06-10 Thread Robert Metzger
Hi Hawin,

here are the slides:
http://www.slideshare.net/robertmetzger1/apache-flink-deepdive-hadoop-summit-2015-in-san-jose-ca
Thank you for the wishes. The talk was very well received.

On Wed, Jun 10, 2015 at 10:41 AM, Hawin Jiang  wrote:

> Hi  Michels
>
> I don't think you can watch them online now.
>
> Can someone share their presentations or feedback to us?
> Thanks
>
>
>
> Best regards
> Hawin
>
> On Mon, Jun 8, 2015 at 2:34 AM, Maximilian Michels  wrote:
>
>> Thank you for your kind wishes :) Good luck from me as well!
>>
>> I was just wondering, is it possible to stream the talks or watch them
>> later on?
>>
>> On Mon, Jun 8, 2015 at 2:54 AM, Hawin Jiang 
>> wrote:
>>
>>> Hi All
>>>
>>>
>>>
>>> As you know that Kostas Tzoumas and Robert Metzger will give us two
>>> Flink talks on 2015 Hadoop summit.
>>>
>>> That is an excellent opportunity to introduce Apache Flink to the world.
>>>
>>> Best wishes for Kostas Tzoumas and Robert Metzger.
>>>
>>>
>>>
>>>
>>>
>>> Here is the details info:
>>>
>>>
>>>
>>> Topic: Apache Flink deep-dive
>>>
>>> Time: 1:45pm - 2:25pm 2015/06/10
>>>
>>> Speakers: Kostas Tzoumas and Robert Metzger
>>>
>>>
>>>
>>> Topic: Flexible and Real-time Stream Processing with Apache Flink
>>>
>>> Time: 3:10pm - 3:50pm 2015/06/11
>>>
>>> Speakers: Kostas Tzoumas and Robert Metzger
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Best regards
>>>
>>> Hawin
>>>
>>
>>
>


Help with Flink experimental Table API

2015-06-10 Thread Shiti Saxena
Hi,

In our project, we are using the Flink Table API and are facing the
following issues,

We load data from a CSV file and create a DataSet[Row]. The CSV file can
also have invalid entries in some of the fields which we replace with null
when building the DataSet[Row].

This DataSet[Row] is later on transformed to Table whenever required and
specific operation such as select or aggregate, etc are performed.

When a null value is encountered, we get a null pointer exception and the
whole job fails. (We can see this by calling collect on the resulting
DataSet).

The error message is similar to,

Job execution failed.
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:315)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NullPointerException
at
org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:63)
at
org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:27)
at
org.apache.flink.api.table.typeinfo.RowSerializer.serialize(RowSerializer.scala:80)
at
org.apache.flink.api.table.typeinfo.RowSerializer.serialize(RowSerializer.scala:28)
at
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
at
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
at
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:83)
at
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at
org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at
org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:177)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:724)

Could this be because the RowSerializer does not support null values?
(Similar to Flink-629  )

Currently, to overcome this issue, we are ignoring all the rows which may
have null values. For example, we have a method cleanData defined as,

def cleanData(table:Table, relevantColumns:Seq[String]):Table = {
val whereClause: String = relevantColumns.map{
cName=>
s"$cName.isNotNull"
}.mkString(" && ")

val result :Table =
table.select(relevantColumns.mkString(",")).where(whereClause)
result
}

Before operating on any Table, we use this method and then continue with
task.

Is this the right way to handle this? If not please let me know how to go
about it.


Thanks,
Shiti


Re: Best wishes for Kostas Tzoumas and Robert Metzger

2015-06-10 Thread Hawin Jiang
Hi  Michels

I don't think you can watch them online now.

Can someone share their presentations or feedback to us?
Thanks



Best regards
Hawin

On Mon, Jun 8, 2015 at 2:34 AM, Maximilian Michels  wrote:

> Thank you for your kind wishes :) Good luck from me as well!
>
> I was just wondering, is it possible to stream the talks or watch them
> later on?
>
> On Mon, Jun 8, 2015 at 2:54 AM, Hawin Jiang  wrote:
>
>> Hi All
>>
>>
>>
>> As you know that Kostas Tzoumas and Robert Metzger will give us two Flink
>> talks on 2015 Hadoop summit.
>>
>> That is an excellent opportunity to introduce Apache Flink to the world.
>>
>> Best wishes for Kostas Tzoumas and Robert Metzger.
>>
>>
>>
>>
>>
>> Here is the details info:
>>
>>
>>
>> Topic: Apache Flink deep-dive
>>
>> Time: 1:45pm - 2:25pm 2015/06/10
>>
>> Speakers: Kostas Tzoumas and Robert Metzger
>>
>>
>>
>> Topic: Flexible and Real-time Stream Processing with Apache Flink
>>
>> Time: 3:10pm - 3:50pm 2015/06/11
>>
>> Speakers: Kostas Tzoumas and Robert Metzger
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>
>


Re: Best way to write data to HDFS by Flink

2015-06-10 Thread Hawin Jiang
Thanks Marton
I will use this code to implement my testing.



Best regards
Hawin

On Wed, Jun 10, 2015 at 1:30 AM, Márton Balassi 
wrote:

> Dear Hawin,
>
> You can pass a hdfs path to DataStream's and DataSet's writeAsText and
> writeAsCsv methods.
> I assume that you are running a Streaming topology, because your source is
> Kafka, so it would look like the following:
>
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
>
> env.addSource(PerisitentKafkaSource(..))
>   .map(/* do you operations*/)
>
> .wirteAsText("hdfs://:/path/to/your/file");
>
> Check out the relevant section of the streaming docs for more info. [1]
>
> [1]
> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world
>
> Best,
>
> Marton
>
> On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang 
> wrote:
>
>> Hi All
>>
>>
>>
>> Can someone tell me what is the best way to write data to HDFS when Flink
>> received data from Kafka?
>>
>> Big thanks for your example.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Best regards
>>
>> Hawin
>>
>>
>>
>
>


user@flink.apache.org, Your Incoming mail Blocked

2015-06-10 Thread Account Service
Dear user@flink.apache.org,

   1969MB2000MB
 We noticed your e-mail account has almost exceed it's limit. And you may not 
be able to send or receive messages any moment from now,   Click Here to renew 
your account.NOTICE:  failure to renew your e-mail account. It will be 
permanently disabled.   Thanks, Account Service

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


Re: Flink-ML as Dependency

2015-06-10 Thread Till Rohrmann
Hi Max,

I think the reason is that the flink-ml pom contains as a dependency an
artifact with artifactId breeze_${scala.binary.version}. The variable
scala.binary.version is defined in the parent pom and not substituted when
flink-ml is installed. Therefore gradle tries to find a dependency with the
name breeze_${scala.binary.version}.

I try to find a solution for this problem. As a quick work around you
should be able to define the variable manually and set it to 2.10.

Cheers,
Till
​

On Wed, Jun 10, 2015 at 3:38 PM Maximilian Alber 
wrote:

> Hi Flinksters,
>
> I would like to test FlinkML. Unfortunately, I get an error when including
> it to my project. It might be my error as I'm not experienced with Gradle,
> but with Google I got any wiser.
>
> My build.gradle looks as follows:
>
> apply plugin: 'java'
> apply plugin: 'scala'
>
> //sourceCompatibility = 1.5
> version = '1.0'
> jar {
> manifest {
> attributes 'Implementation-Title': 'Test Project',
>'Implementation-Version': 1.0
> }
> }
>
> repositories {
>   mavenCentral()
>   mavenLocal()
> }
>
> dependencies {
>   compile 'org.scala-lang:scala-library:2.10.5'
>   compile 'org.scala-lang:scala-compiler:2.10.5'
>
>   compile 'org.scalanlp:breeze_2.10:0.11.2'
>
>   compile group: 'org.apache.flink', name: 'flink-clients', version:
> '0.9-SNAPSHOT'
>   compile group: 'org.apache.flink', name: 'flink-scala', version:
> '0.9-SNAPSHOT'
>   compile group: 'org.apache.flink', name: 'flink-ml', version:
> '0.9-SNAPSHOT'
> }
>
> And I get the following error:
>
> alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle
> compileScala
> Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
>
> FAILURE: Build failed with an exception.
>
> * What went wrong:
> Could not resolve all dependencies for configuration ':compile'.
> > Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
>   Required by:
>   :test:1.0 > org.apache.flink:flink-ml:0.9-SNAPSHOT
>> Illegal character in path at index 51:
> http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom
>
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or
> --debug option to get more log output.
>
> BUILD FAILED
>
> Total time: 7.113 secs
>
>
> I'm thankful for any ideas!
>
> Cheers,
> Max
>


Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Kostas Tzoumas
Please do ping this list if you encounter any problems with Flink during
your project (you have done so already :-), but also if you find that the
Flink API needs additions to map Pig well to Flink

On Wed, Jun 10, 2015 at 3:47 PM, Philipp Goetze <
philipp.goe...@tu-ilmenau.de> wrote:

> Done. Can be found here: https://issues.apache.org/jira/browse/FLINK-2200
>
> Best Regards,
> Philipp
>
>
>
> On 10.06.2015 15:29, Chiwan Park wrote:
>
>> But I think uploading Flink API with scala 2.11 to maven repository is
>> nice idea.
>> Could you create a JIRA issue?
>>
>> Regards,
>> Chiwan Park
>>
>>  On Jun 10, 2015, at 10:23 PM, Chiwan Park  wrote:
>>>
>>> No. Currently, there are no Flink binaries with scala 2.11 which are
>>> downloadable.
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>  On Jun 10, 2015, at 10:18 PM, Philipp Goetze <
 philipp.goe...@tu-ilmenau.de> wrote:

 Thank you Chiwan!

 I did not know the master has a 2.11 profile.

 But there is no pre-built Flink with 2.11, which I could refer to in
 sbt or maven, is it?

 Best Regards,
 Philipp

 On 10.06.2015 15:03, Chiwan Park wrote:

> Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in
> master branch.
> `mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command
> builds Flink with Scala 2.11.
>
> Regards,
> Chiwan Park
>
>  On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier 
>> wrote:
>>
>> Nice!
>>
>> On 10 Jun 2015 14:49, "Philipp Goetze" 
>> wrote:
>> Hi community!
>>
>> We started a new project called Piglet (
>> https://github.com/ksattler/piglet).
>> For that we use i.a. Flink as a backend. The project is based on
>> Scala 2.11. Thus we need a 2.11 build of Flink.
>>
>> Until now we used the 2.11 branch of the stratosphere project and
>> built Flink ourselves. Unfortunately this branch is not up-to-date.
>>
>> Do you have an official repository for Flink 0.9 (built with Scala
>> 2.11)?
>>
>> Best Regards,
>> Philipp
>>
>
>
>
>
>>>
>>>
>>>
>>
>>
>>
>


Re: Load balancing

2015-06-10 Thread Gianmarco De Francisci Morales
We have been working on an adaptive load balancing strategy that would
address exactly the issue you point out.
FLINK-1725 is the starting point for the integration.

Cheers,

--
Gianmarco

On 9 June 2015 at 20:31, Fabian Hueske  wrote:

> Hi Sebastian,
>
> I agree, shuffling only specific elements would be a very useful feature,
> but unfortunately it's not supported (yet).
> Would you like to open a JIRA for that?
>
> Cheers, Fabian
>
> 2015-06-09 17:22 GMT+02:00 Kruse, Sebastian :
>
>>  Hi folks,
>>
>>
>>
>> I would like to do some load balancing within one of my Flink jobs to
>> achieve good scalability. The rebalance() method is not applicable in my
>> case, as the runtime is dominated by the processing of very few larger
>> elements in my dataset. Hence, I need to distribute the processing work for
>> these elements among the nodes in the cluster. To do so, I subdivide those
>> elements into partial tasks and want to distribute these partial tasks to
>> other nodes by employing a custom partitioner.
>>
>>
>>
>> Now, my question is the following: Actually, I do not need to shuffle the
>> complete dataset but only a few elements. So is there a way of telling
>> within the partitioner, that data should reside on the same task manager?
>> Thanks!
>>
>>
>>
>> Cheers,
>>
>> Sebastian
>>
>
>


Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Philipp Goetze

Done. Can be found here: https://issues.apache.org/jira/browse/FLINK-2200

Best Regards,
Philipp


On 10.06.2015 15:29, Chiwan Park wrote:

But I think uploading Flink API with scala 2.11 to maven repository is nice 
idea.
Could you create a JIRA issue?

Regards,
Chiwan Park


On Jun 10, 2015, at 10:23 PM, Chiwan Park  wrote:

No. Currently, there are no Flink binaries with scala 2.11 which are 
downloadable.

Regards,
Chiwan Park


On Jun 10, 2015, at 10:18 PM, Philipp Goetze  
wrote:

Thank you Chiwan!

I did not know the master has a 2.11 profile.

But there is no pre-built Flink with 2.11, which I could refer to in sbt or 
maven, is it?

Best Regards,
Philipp

On 10.06.2015 15:03, Chiwan Park wrote:

Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in master 
branch.
`mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command builds Flink 
with Scala 2.11.

Regards,
Chiwan Park


On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier  wrote:

Nice!

On 10 Jun 2015 14:49, "Philipp Goetze"  wrote:
Hi community!

We started a new project called Piglet (https://github.com/ksattler/piglet).
For that we use i.a. Flink as a backend. The project is based on Scala 2.11. 
Thus we need a 2.11 build of Flink.

Until now we used the 2.11 branch of the stratosphere project and built Flink 
ourselves. Unfortunately this branch is not up-to-date.

Do you have an official repository for Flink 0.9 (built with Scala 2.11)?

Best Regards,
Philipp
















Flink-ML as Dependency

2015-06-10 Thread Maximilian Alber
Hi Flinksters,

I would like to test FlinkML. Unfortunately, I get an error when including
it to my project. It might be my error as I'm not experienced with Gradle,
but with Google I got any wiser.

My build.gradle looks as follows:

apply plugin: 'java'
apply plugin: 'scala'

//sourceCompatibility = 1.5
version = '1.0'
jar {
manifest {
attributes 'Implementation-Title': 'Test Project',
   'Implementation-Version': 1.0
}
}

repositories {
  mavenCentral()
  mavenLocal()
}

dependencies {
  compile 'org.scala-lang:scala-library:2.10.5'
  compile 'org.scala-lang:scala-compiler:2.10.5'

  compile 'org.scalanlp:breeze_2.10:0.11.2'

  compile group: 'org.apache.flink', name: 'flink-clients', version:
'0.9-SNAPSHOT'
  compile group: 'org.apache.flink', name: 'flink-scala', version:
'0.9-SNAPSHOT'
  compile group: 'org.apache.flink', name: 'flink-ml', version:
'0.9-SNAPSHOT'
}

And I get the following error:

alber@alberTU:/media/alber/datadisk/tmp/flink/code/test$ gradle compileScala
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar

FAILURE: Build failed with an exception.

* What went wrong:
Could not resolve all dependencies for configuration ':compile'.
> Could not resolve org.scalanlp:breeze_${scala.binary.version}:0.11.2.
  Required by:
  :test:1.0 > org.apache.flink:flink-ml:0.9-SNAPSHOT
   > Illegal character in path at index 51:
http://repo1.maven.org/maven2/org/scalanlp/breeze_${scala.binary.version}/0.11.2/breeze_${scala.binary.version}-0.11.2.pom

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output.

BUILD FAILED

Total time: 7.113 secs


I'm thankful for any ideas!

Cheers,
Max


Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Chiwan Park
But I think uploading Flink API with scala 2.11 to maven repository is nice 
idea.
Could you create a JIRA issue?

Regards,
Chiwan Park

> On Jun 10, 2015, at 10:23 PM, Chiwan Park  wrote:
> 
> No. Currently, there are no Flink binaries with scala 2.11 which are 
> downloadable.
> 
> Regards,
> Chiwan Park
> 
>> On Jun 10, 2015, at 10:18 PM, Philipp Goetze  
>> wrote:
>> 
>> Thank you Chiwan!
>> 
>> I did not know the master has a 2.11 profile.
>> 
>> But there is no pre-built Flink with 2.11, which I could refer to in sbt or 
>> maven, is it?
>> 
>> Best Regards,
>> Philipp
>> 
>> On 10.06.2015 15:03, Chiwan Park wrote:
>>> Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in master 
>>> branch.
>>> `mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command builds 
>>> Flink with Scala 2.11.
>>> 
>>> Regards,
>>> Chiwan Park
>>> 
 On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier  
 wrote:
 
 Nice!
 
 On 10 Jun 2015 14:49, "Philipp Goetze"  
 wrote:
 Hi community!
 
 We started a new project called Piglet 
 (https://github.com/ksattler/piglet).
 For that we use i.a. Flink as a backend. The project is based on Scala 
 2.11. Thus we need a 2.11 build of Flink.
 
 Until now we used the 2.11 branch of the stratosphere project and built 
 Flink ourselves. Unfortunately this branch is not up-to-date.
 
 Do you have an official repository for Flink 0.9 (built with Scala 2.11)?
 
 Best Regards,
 Philipp
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> 
> 






Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Chiwan Park
No. Currently, there are no Flink binaries with scala 2.11 which are 
downloadable.

Regards,
Chiwan Park

> On Jun 10, 2015, at 10:18 PM, Philipp Goetze  
> wrote:
> 
> Thank you Chiwan!
> 
> I did not know the master has a 2.11 profile.
> 
> But there is no pre-built Flink with 2.11, which I could refer to in sbt or 
> maven, is it?
> 
> Best Regards,
> Philipp
> 
> On 10.06.2015 15:03, Chiwan Park wrote:
>> Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in master 
>> branch.
>> `mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command builds 
>> Flink with Scala 2.11.
>> 
>> Regards,
>> Chiwan Park
>> 
>>> On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier  
>>> wrote:
>>> 
>>> Nice!
>>> 
>>> On 10 Jun 2015 14:49, "Philipp Goetze"  wrote:
>>> Hi community!
>>> 
>>> We started a new project called Piglet (https://github.com/ksattler/piglet).
>>> For that we use i.a. Flink as a backend. The project is based on Scala 
>>> 2.11. Thus we need a 2.11 build of Flink.
>>> 
>>> Until now we used the 2.11 branch of the stratosphere project and built 
>>> Flink ourselves. Unfortunately this branch is not up-to-date.
>>> 
>>> Do you have an official repository for Flink 0.9 (built with Scala 2.11)?
>>> 
>>> Best Regards,
>>> Philipp
>> 
>> 
>> 
>> 
> 






Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Till Rohrmann
No there are no Scala 2.11 Flink binaries which you can download. You have
to build it yourself.

Cheers,
Till

On Wed, Jun 10, 2015 at 3:19 PM Philipp Goetze 
wrote:

> Thank you Chiwan!
>
> I did not know the master has a 2.11 profile.
>
> But there is no pre-built Flink with 2.11, which I could refer to in sbt
> or maven, is it?
>
> Best Regards,
> Philipp
>
> On 10.06.2015 15:03, Chiwan Park wrote:
> > Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in
> master branch.
> > `mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command
> builds Flink with Scala 2.11.
> >
> > Regards,
> > Chiwan Park
> >
> >> On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier 
> wrote:
> >>
> >> Nice!
> >>
> >> On 10 Jun 2015 14:49, "Philipp Goetze" 
> wrote:
> >> Hi community!
> >>
> >> We started a new project called Piglet (
> https://github.com/ksattler/piglet).
> >> For that we use i.a. Flink as a backend. The project is based on Scala
> 2.11. Thus we need a 2.11 build of Flink.
> >>
> >> Until now we used the 2.11 branch of the stratosphere project and built
> Flink ourselves. Unfortunately this branch is not up-to-date.
> >>
> >> Do you have an official repository for Flink 0.9 (built with Scala
> 2.11)?
> >>
> >> Best Regards,
> >> Philipp
> >
> >
> >
> >
>
>


Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Philipp Goetze

Thank you Chiwan!

I did not know the master has a 2.11 profile.

But there is no pre-built Flink with 2.11, which I could refer to in sbt 
or maven, is it?


Best Regards,
Philipp

On 10.06.2015 15:03, Chiwan Park wrote:

Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in master 
branch.
`mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command builds Flink 
with Scala 2.11.

Regards,
Chiwan Park


On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier  wrote:

Nice!

On 10 Jun 2015 14:49, "Philipp Goetze"  wrote:
Hi community!

We started a new project called Piglet (https://github.com/ksattler/piglet).
For that we use i.a. Flink as a backend. The project is based on Scala 2.11. 
Thus we need a 2.11 build of Flink.

Until now we used the 2.11 branch of the stratosphere project and built Flink 
ourselves. Unfortunately this branch is not up-to-date.

Do you have an official repository for Flink 0.9 (built with Scala 2.11)?

Best Regards,
Philipp









Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Chiwan Park
Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in master 
branch.
`mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command builds Flink 
with Scala 2.11.

Regards,
Chiwan Park

> On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier  wrote:
> 
> Nice!
> 
> On 10 Jun 2015 14:49, "Philipp Goetze"  wrote:
> Hi community!
> 
> We started a new project called Piglet (https://github.com/ksattler/piglet).
> For that we use i.a. Flink as a backend. The project is based on Scala 2.11. 
> Thus we need a 2.11 build of Flink.
> 
> Until now we used the 2.11 branch of the stratosphere project and built Flink 
> ourselves. Unfortunately this branch is not up-to-date.
> 
> Do you have an official repository for Flink 0.9 (built with Scala 2.11)?
> 
> Best Regards,
> Philipp







Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Flavio Pompermaier
Nice!
On 10 Jun 2015 14:49, "Philipp Goetze"  wrote:

> Hi community!
>
> We started a new project called Piglet (https://github.com/ksattler/piglet
> ).
> For that we use i.a. Flink as a backend. The project is based on Scala
> 2.11. Thus we need a 2.11 build of Flink.
>
> Until now we used the 2.11 branch of the stratosphere project and built
> Flink ourselves. Unfortunately this branch is not up-to-date.
>
> Do you have an official repository for Flink 0.9 (built with Scala 2.11)?
>
> Best Regards,
> Philipp
>


Flink 0.9 built with Scala 2.11

2015-06-10 Thread Philipp Goetze

Hi community!

We started a new project called Piglet (https://github.com/ksattler/piglet).
For that we use i.a. Flink as a backend. The project is based on Scala 
2.11. Thus we need a 2.11 build of Flink.


Until now we used the 2.11 branch of the stratosphere project and built 
Flink ourselves. Unfortunately this branch is not up-to-date.


Do you have an official repository for Flink 0.9 (built with Scala 2.11)?

Best Regards,
Philipp


Re: Best way to write data to HDFS by Flink

2015-06-10 Thread Márton Balassi
Dear Hawin,

You can pass a hdfs path to DataStream's and DataSet's writeAsText and
writeAsCsv methods.
I assume that you are running a Streaming topology, because your source is
Kafka, so it would look like the following:

StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();

env.addSource(PerisitentKafkaSource(..))
  .map(/* do you operations*/)

.wirteAsText("hdfs://:/path/to/your/file");

Check out the relevant section of the streaming docs for more info. [1]

[1]
http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#connecting-to-the-outside-world

Best,

Marton

On Wed, Jun 10, 2015 at 10:22 AM, Hawin Jiang  wrote:

> Hi All
>
>
>
> Can someone tell me what is the best way to write data to HDFS when Flink
> received data from Kafka?
>
> Big thanks for your example.
>
>
>
>
>
>
>
>
>
> Best regards
>
> Hawin
>
>
>


Best way to write data to HDFS by Flink

2015-06-10 Thread Hawin Jiang
Hi All

 

Can someone tell me what is the best way to write data to HDFS when Flink
received data from Kafka?

Big thanks for your example.

 

 

 

 

Best regards

Hawin

 



Re: Apache Flink transactions

2015-06-10 Thread Fabian Hueske
Comparing the performance of systems is not easy and the results depend on
a lot of things as the configuration, data, and jobs.

That being said, the numbers that Bill reported for WordCount make
absolutely sense as Stephan pointed out in his response (Flink does not
feature hash-based aggregations yet).
So there are definitely use cases where Spark outperforms Flink, but there
are also other cases where both systems perform similar or Flink is faster.
For example more complex jobs benefit a lot from Flink's pipelined
execution and Flink's build-in iterations are very fast, especially
delta-iterations.

Best, Fabian



2015-06-10 0:53 GMT+02:00 Hawin Jiang :

> Hey Aljoscha
>
> I also sent an email to Bill for asking the latest test results. From
> Bill's email, Apache Spark performance looks like better than Flink.
> How about your thoughts.
>
>
>
> Best regards
> Hawin
>
>
>
> On Tue, Jun 9, 2015 at 2:29 AM, Aljoscha Krettek 
> wrote:
>
>> Hi,
>> we don't have any current performance numbers. But the queries mentioned
>> on the benchmark page should be easy to implement in Flink. It could be
>> interesting if someone ported these queries and ran them with exactly the
>> same data on the same machines.
>>
>> Bill Sparks wrote on the mailing list some days ago (
>> http://mail-archives.apache.org/mod_mbox/flink-user/201506.mbox/%3cd1972778.64426%25jspa...@cray.com%3e).
>> He seems to be running some tests to compare Flink, Spark and MapReduce.
>>
>> Regards,
>> Aljoscha
>>
>> On Mon, Jun 8, 2015 at 9:09 PM, Hawin Jiang 
>> wrote:
>>
>>> Hi Aljoscha
>>>
>>> I want to know what is the apache flink performance if I run the same
>>> SQL as below.
>>> Do you have any apache flink benchmark information?
>>> Such as: https://amplab.cs.berkeley.edu/benchmark/
>>> Thanks.
>>>
>>>
>>>
>>> SELECT pageURL, pageRank FROM rankings WHERE pageRank > X
>>>
>>> Query 1A
>>> 32,888 resultsQuery 1B
>>> 3,331,851 resultsQuery 1C
>>> 89,974,976 results05101520253035404550Redshift (HDD)Impala - DiskImpala
>>> - MemShark - DiskShark - MemHiveTez0510152025303540455055Redshift
>>> (HDD)Impala - DiskImpala - MemShark - DiskShark - 
>>> MemHiveTez0510152025303540Redshift
>>> (HDD)Impala - DiskImpala - MemShark - DiskShark - MemHiveTezOld DataMedian
>>> Response Time (s)Redshift (HDD) - Current2.492.619.46Impala - Disk -
>>> 1.2.312.01512.01537.085Impala - Mem - 1.2.32.173.0136.04Shark - Disk -
>>> 0.8.16.6722.4Shark - Mem - 0.8.11.71.83.6Hive - 0.12 YARN50.4959.9343.34Tez
>>> - 0.2.028.2236.3526.44
>>>
>>>
>>> On Mon, Jun 8, 2015 at 2:03 AM, Aljoscha Krettek 
>>> wrote:
>>>
 Hi,
 actually, what do you want to know about Flink SQL?

 Aljoscha

 On Sat, Jun 6, 2015 at 2:22 AM, Hawin Jiang 
 wrote:
 > Thanks all
 >
 > Actually, I want to know more info about Flink SQL and Flink
 performance
 > Here is the Spark benchmark. Maybe you already saw it before.
 > https://amplab.cs.berkeley.edu/benchmark/
 >
 > Thanks.
 >
 >
 >
 > Best regards
 > Hawin
 >
 >
 >
 > On Fri, Jun 5, 2015 at 1:35 AM, Fabian Hueske 
 wrote:
 >>
 >> If you want to append data to a data set that is store as files
 (e.g., on
 >> HDFS), you can go for a directory structure as follows:
 >>
 >> dataSetRootFolder
 >>   - part1
 >> - 1
 >> - 2
 >> - ...
 >>   - part2
 >> - 1
 >> - ...
 >>   - partX
 >>
 >> Flink's file format supports recursive directory scans such that you
 can
 >> add new subfolders to dataSetRootFolder and read the full data set.
 >>
 >> 2015-06-05 9:58 GMT+02:00 Aljoscha Krettek :
 >>>
 >>> Hi,
 >>> I think the example could be made more concise by using the Table
 API.
 >>>
 http://ci.apache.org/projects/flink/flink-docs-master/libs/table.html
 >>>
 >>> Please let us know if you have questions about that, it is still
 quite
 >>> new.
 >>>
 >>> On Fri, Jun 5, 2015 at 9:03 AM, hawin 
 wrote:
 >>> > Hi Aljoscha
 >>> >
 >>> > Thanks for your reply.
 >>> > Do you have any tips for Flink SQL.
 >>> > I know that Spark support ORC format. How about Flink SQL?
 >>> > BTW, for TPCHQuery10 example, you have implemented it by 231
 lines of
 >>> > code.
 >>> > How to make that as simple as possible by flink.
 >>> > I am going to use Flink in my future project.  Sorry for so many
 >>> > questions.
 >>> > I believe that you guys will make a world difference.
 >>> >
 >>> >
 >>> > @Chiwan
 >>> > You made a very good example for me.
 >>> > Thanks a lot
 >>> >
 >>> >
 >>> >
 >>> >
 >>> >
 >>> > --
 >>> > View this message in context:
 >>> >
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Apache-Flink-transactions-tp1457p1494.html
 >>> > Sent from the Apache Flink User Mailing List archive. mai