[jira] [Resolved] (FLINK-4297) Yarn client can't determine fat jar location if path contains spaces

2016-08-05 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-4297.
-
Resolution: Fixed

Fixed for
  - 1.2.0 via c7a85545ba73e93e4a55ef8886362badaa2e2147
  - 1.1.1 via e4f62d3b38ccb93e45c50ee97dae9157a0493e47

> Yarn client can't determine fat jar location if path contains spaces
> 
>
> Key: FLINK-4297
> URL: https://issues.apache.org/jira/browse/FLINK-4297
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.2.0, 1.1.1
>
>
> The code that automatically determines the fat jar path through the 
> ProtectionDomain of the Yarn class, receives a possibly URL encoded path 
> string. We need to decode using the system locale encoding, otherwise we can 
> receive errors of the following when spaces are in the file path: 
> {noformat}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/max/Downloads/release-testing/flink-1.1.0-rc1/flink-1.1.0/build%20target/lib/flink-dist_2.11-1.1.0.jar
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:82)
> at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1836)
> at org.apache.flink.yarn.Utils.setupLocalResource(Utils.java:129)
> at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:616)
> at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:365)
> ... 6 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4297) Yarn client can't determine fat jar location if path contains spaces

2016-08-05 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-4297.
---

> Yarn client can't determine fat jar location if path contains spaces
> 
>
> Key: FLINK-4297
> URL: https://issues.apache.org/jira/browse/FLINK-4297
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.2.0, 1.1.1
>
>
> The code that automatically determines the fat jar path through the 
> ProtectionDomain of the Yarn class, receives a possibly URL encoded path 
> string. We need to decode using the system locale encoding, otherwise we can 
> receive errors of the following when spaces are in the file path: 
> {noformat}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/max/Downloads/release-testing/flink-1.1.0-rc1/flink-1.1.0/build%20target/lib/flink-dist_2.11-1.1.0.jar
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:82)
> at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1836)
> at org.apache.flink.yarn.Utils.setupLocalResource(Utils.java:129)
> at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:616)
> at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:365)
> ... 6 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4323) Checkpoint Coordinator Removes HA Checkpoints in Shutdown

2016-08-05 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-4323:
---

 Summary: Checkpoint Coordinator Removes HA Checkpoints in Shutdown
 Key: FLINK-4323
 URL: https://issues.apache.org/jira/browse/FLINK-4323
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.1.0
Reporter: Stephan Ewen
Priority: Blocker
 Fix For: 1.2.0, 1.1.1


The {{CheckpointCoordinator}} has a shutdown hook that "shuts down" the 
savepoint store, rather than suspending it.

As a consequence, HA checkpoints may be lost when the JobManager process fails 
but allows the shutdown hook to run.

I would suggest to remove the sutdown hook from the CheckpointCoordinator all 
together. The JobManager process is responsible for cleanups and can better 
decide what should be cleaned up and what not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4323) Checkpoint Coordinator Removes HA Checkpoints in Shutdown

2016-08-05 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-4323.
-
Resolution: Invalid

Shutdown hook is only installed in "standalone" mode, which in the 
high-availability context means "non high-availability" mode

> Checkpoint Coordinator Removes HA Checkpoints in Shutdown
> -
>
> Key: FLINK-4323
> URL: https://issues.apache.org/jira/browse/FLINK-4323
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Priority: Blocker
> Fix For: 1.2.0, 1.1.1
>
>
> The {{CheckpointCoordinator}} has a shutdown hook that "shuts down" the 
> savepoint store, rather than suspending it.
> As a consequence, HA checkpoints may be lost when the JobManager process 
> fails but allows the shutdown hook to run.
> I would suggest to remove the sutdown hook from the CheckpointCoordinator all 
> together. The JobManager process is responsible for cleanups and can better 
> decide what should be cleaned up and what not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4323) Checkpoint Coordinator Removes HA Checkpoints in Shutdown

2016-08-05 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-4323.
---

> Checkpoint Coordinator Removes HA Checkpoints in Shutdown
> -
>
> Key: FLINK-4323
> URL: https://issues.apache.org/jira/browse/FLINK-4323
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Priority: Blocker
> Fix For: 1.2.0, 1.1.1
>
>
> The {{CheckpointCoordinator}} has a shutdown hook that "shuts down" the 
> savepoint store, rather than suspending it.
> As a consequence, HA checkpoints may be lost when the JobManager process 
> fails but allows the shutdown hook to run.
> I would suggest to remove the sutdown hook from the CheckpointCoordinator all 
> together. The JobManager process is responsible for cleanups and can better 
> decide what should be cleaned up and what not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4324) Enable Akka SSL

2016-08-05 Thread Suresh Krishnappa (JIRA)
Suresh Krishnappa created FLINK-4324:


 Summary: Enable Akka SSL
 Key: FLINK-4324
 URL: https://issues.apache.org/jira/browse/FLINK-4324
 Project: Flink
  Issue Type: Sub-task
Reporter: Suresh Krishnappa


This issue is to address part T3-2 (Enable Akka SSL) of the [Secure Data 
Access|https://docs.google.com/document/d/1-GQB6uVOyoaXGwtqwqLV8BHDxWiMO2WnVzBoJ8oPaAs/edit?usp=sharing]
 design doc.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4325) Implement Web UI HTTPS

2016-08-05 Thread Suresh Krishnappa (JIRA)
Suresh Krishnappa created FLINK-4325:


 Summary: Implement Web UI HTTPS
 Key: FLINK-4325
 URL: https://issues.apache.org/jira/browse/FLINK-4325
 Project: Flink
  Issue Type: Sub-task
Reporter: Suresh Krishnappa


This issue is to address part T3-5 (Implement Web UI HTTPS) of the [Secure Data 
Access|https://docs.google.com/document/d/1-GQB6uVOyoaXGwtqwqLV8BHDxWiMO2WnVzBoJ8oPaAs/edit?usp=sharing]
 design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-05 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-4322:

Issue Type: Improvement  (was: Bug)

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4322) Unify CheckpointCoordinator and SavepointCoordinator

2016-08-05 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409961#comment-15409961
 ] 

Stephan Ewen commented on FLINK-4322:
-

I would like to make some more radical changes to the current checkpoint / 
savepoint design.
  - There should be no difference at all between checkpoint and savepoint any 
more.
  - A savepoint is merely a parameterized checkpoint (with a future to announce 
its result)
  - There is no SavepointCoordinator any more
  - a savepoint restore simply adds a completed checkpoint to the checkpoint 
coordinator

The execution graph code should become very simple that way.
Since that is a pretty deep change in the checkpointing mechanism, I would be 
happy to actually take this over.

> Unify CheckpointCoordinator and SavepointCoordinator
> 
>
> Key: FLINK-4322
> URL: https://issues.apache.org/jira/browse/FLINK-4322
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
> Fix For: 1.2.0
>
>
> The Checkpoint coordinator should have the functionality of both handling 
> checkpoints and savepoints.
> The difference between checkpoints and savepoints is minimal:
>   - savepoints always write the root metadata of the checkpoint
>   - savepoints are always full (never incremental)
> The commonalities are large
>   - jobs should be able to resume from checkpoint or savepoints
>   - jobs should fall back to the latest checkpoint or savepoint
> This subsumes issue https://issues.apache.org/jira/browse/FLINK-3397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground

2016-08-05 Thread Elias Levy (JIRA)
Elias Levy created FLINK-4326:
-

 Summary: Flink start-up scripts should optionally start services 
on the foreground
 Key: FLINK-4326
 URL: https://issues.apache.org/jira/browse/FLINK-4326
 Project: Flink
  Issue Type: Improvement
  Components: Startup Shell Scripts
Affects Versions: 1.0.3
Reporter: Elias Levy


This has previously been mentioned in the mailing list, but has not been 
addressed.  Flink start-up scripts start the job and task managers in the 
background.  This makes it difficult to integrate Flink with most processes 
supervisory tools and init systems, including Docker.  One can get around this 
via hacking the scripts or manually starting the right classes via Java, but it 
is a brittle solution.

In addition to starting the daemons in the foreground, the start up scripts 
should use exec instead of running the commends, so as to avoid forks.  Many 
supervisory tools assume the PID of the process to be monitored is that of the 
process it first executes, and fork chains make it difficult for the supervisor 
to figure out what process to monitor.  Specifically, jobmanager.sh and 
taskmanager.sh should exec flink-daemon.sh, and flink-daemon.sh should exec 
java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread delding
Github user delding commented on the issue:

https://github.com/apache/flink/pull/2332
  
BTW, failing tests are results of this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410381#comment-15410381
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user delding commented on the issue:

https://github.com/apache/flink/pull/2332
  
BTW, failing tests are results of this PR


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779190
  
--- Diff: flink-streaming-connectors/flink-connector-hbase/pom.xml ---
@@ -0,0 +1,68 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+
+4.0.0
+
+
+org.apache.flink
+flink-streaming-connectors
+1.1-SNAPSHOT
--- End diff --

We've recently just bumped version to 1.2-SNAPSHOT.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410395#comment-15410395
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779190
  
--- Diff: flink-streaming-connectors/flink-connector-hbase/pom.xml ---
@@ -0,0 +1,68 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+
+4.0.0
+
+
+org.apache.flink
+flink-streaming-connectors
+1.1-SNAPSHOT
--- End diff --

We've recently just bumped version to 1.2-SNAPSHOT.


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779255
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.util.Preconditions;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.client.Durability;
+import org.apache.hadoop.hbase.client.Increment;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseSink extends RichSinkFunction {
--- End diff --

Since this is public facing, we should have Javadocs for this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410397#comment-15410397
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779255
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.util.Preconditions;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.client.Durability;
+import org.apache.hadoop.hbase.client.Increment;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseSink extends RichSinkFunction {
--- End diff --

Since this is public facing, we should have Javadocs for this class.


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779290
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.util.Preconditions;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.client.Durability;
+import org.apache.hadoop.hbase.client.Increment;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseSink extends RichSinkFunction {
+   private static final long serialVersionUID = 1L;
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(HBaseSink.class);
+
+   private transient HBaseClient client;
+   private String tableName;
+   private HBaseMapper mapper;
+   private boolean writeToWAL = true;
+
+   public HBaseSink(String tableName, HBaseMapper mapper) {
--- End diff --

Will need Javadocs for the constructor too :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410399#comment-15410399
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779290
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.util.Preconditions;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.client.Durability;
+import org.apache.hadoop.hbase.client.Increment;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseSink extends RichSinkFunction {
+   private static final long serialVersionUID = 1L;
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(HBaseSink.class);
+
+   private transient HBaseClient client;
+   private String tableName;
+   private HBaseMapper mapper;
+   private boolean writeToWAL = true;
+
+   public HBaseSink(String tableName, HBaseMapper mapper) {
--- End diff --

Will need Javadocs for the constructor too :)


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779299
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.util.Preconditions;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.client.Durability;
+import org.apache.hadoop.hbase.client.Increment;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseSink extends RichSinkFunction {
+   private static final long serialVersionUID = 1L;
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(HBaseSink.class);
+
+   private transient HBaseClient client;
+   private String tableName;
+   private HBaseMapper mapper;
+   private boolean writeToWAL = true;
+
+   public HBaseSink(String tableName, HBaseMapper mapper) {
+   Preconditions.checkArgument(!Strings.isNullOrEmpty(tableName), 
"Table name cannot be null or empty");
+   Preconditions.checkArgument(mapper != null, "HBase mapper 
cannot be null");
+   this.tableName = tableName;
+   this.mapper = mapper;
+   }
+
+   public HBaseSink writeToWAL(boolean writeToWAL) {
--- End diff --

Again, Javadoc for public facing interfaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410400#comment-15410400
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779299
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.util.Preconditions;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.client.Durability;
+import org.apache.hadoop.hbase.client.Increment;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Put;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseSink extends RichSinkFunction {
+   private static final long serialVersionUID = 1L;
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(HBaseSink.class);
+
+   private transient HBaseClient client;
+   private String tableName;
+   private HBaseMapper mapper;
+   private boolean writeToWAL = true;
+
+   public HBaseSink(String tableName, HBaseMapper mapper) {
+   Preconditions.checkArgument(!Strings.isNullOrEmpty(tableName), 
"Table name cannot be null or empty");
+   Preconditions.checkArgument(mapper != null, "HBase mapper 
cannot be null");
+   this.tableName = tableName;
+   this.mapper = mapper;
+   }
+
+   public HBaseSink writeToWAL(boolean writeToWAL) {
--- End diff --

Again, Javadoc for public facing interfaces.


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779327
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseMapper.java
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Maps a input value to a row in HBase table.
+ *
+ * @param  input type
+ */
+public interface HBaseMapper extends Serializable {
+
+   /**
+* Given a input value return the HBase row key.
+*
+* @param value
+* @return row key
+*/
+   byte[] rowKey(IN value);
+
+   /**
+* Given a input value return a list of HBase columns.
+*
+* @param value
+* @return a list of HBase columns
+*/
+   List columns(IN value);
+
+   /**
+*  Represents a HBase column which can be either a standard one or a 
counter.
+*/
+   class HBaseColumn {
+   private byte[] family;
+   private byte[] qualifier;
+   private byte[] value;
+   private long timestamp;
+   private Long increment;
--- End diff --

I would make these `final`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779331
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseClient.java
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.Connection;
+import org.apache.hadoop.hbase.client.ConnectionFactory;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseClient implements Closeable {
--- End diff --

Can you also add a simple Javadoc for this class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410403#comment-15410403
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779331
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseClient.java
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.Connection;
+import org.apache.hadoop.hbase.client.ConnectionFactory;
+import org.apache.hadoop.hbase.client.Mutation;
+import org.apache.hadoop.hbase.client.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+
+public class HBaseClient implements Closeable {
--- End diff --

Can you also add a simple Javadoc for this class?


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410402#comment-15410402
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779327
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseMapper.java
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Maps a input value to a row in HBase table.
+ *
+ * @param  input type
+ */
+public interface HBaseMapper extends Serializable {
+
+   /**
+* Given a input value return the HBase row key.
+*
+* @param value
+* @return row key
+*/
+   byte[] rowKey(IN value);
+
+   /**
+* Given a input value return a list of HBase columns.
+*
+* @param value
+* @return a list of HBase columns
+*/
+   List columns(IN value);
+
+   /**
+*  Represents a HBase column which can be either a standard one or a 
counter.
+*/
+   class HBaseColumn {
+   private byte[] family;
+   private byte[] qualifier;
+   private byte[] value;
+   private long timestamp;
+   private Long increment;
--- End diff --

I would make these `final`.


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779379
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
--- End diff --

Can you try removing the Guava dependencies? In Flink we try to avoid using 
Guava because of dependency issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410405#comment-15410405
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2332#discussion_r73779379
  
--- Diff: 
flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.hbase;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Lists;
--- End diff --

Can you try removing the Guava dependencies? In Flink we try to avoid using 
Guava because of dependency issues.


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2332
  
@delding Thank you for working on this contribution. Since I'm not that 
familiar with the HBase client API, I've only skimmed through the code for now. 
Will try to review in detail later.

Btw, since there are no tests for this PR yet, can you describe how you 
tested it? An example of using the `HBaseSink` will also be helpful (we'd also 
need to add documentation for the sink, like the other connectors).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410417#comment-15410417
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2332
  
@delding Thank you for working on this contribution. Since I'm not that 
familiar with the HBase client API, I've only skimmed through the code for now. 
Will try to review in detail later.

Btw, since there are no tests for this PR yet, can you describe how you 
tested it? An example of using the `HBaseSink` will also be helpful (we'd also 
need to add documentation for the sink, like the other connectors).


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread delding
Github user delding commented on the issue:

https://github.com/apache/flink/pull/2332
  
Hi @tzulitai Thank you for all the comments :-) I will update the PR and 
add an example in next a few days


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2055) Implement Streaming HBaseSink

2016-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410447#comment-15410447
 ] 

ASF GitHub Bot commented on FLINK-2055:
---

Github user delding commented on the issue:

https://github.com/apache/flink/pull/2332
  
Hi @tzulitai Thank you for all the comments :-) I will update the PR and 
add an example in next a few days


> Implement Streaming HBaseSink
> -
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4266) Remote Storage Statebackend

2016-08-05 Thread Chen Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-4266:

Summary: Remote Storage Statebackend  (was: Cassandra StateBackend)

> Remote Storage Statebackend
> ---
>
> Key: FLINK-4266
> URL: https://issues.apache.org/jira/browse/FLINK-4266
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.0.3, 1.2.0
>Reporter: Chen Qin
>Priority: Minor
>
> Current FileSystem statebackend limits whole state size to disk space. 
> For long running task that hold window content for long period of time, it 
> needs to split out states to durable remote storage and replicated across 
> data centers.
> We look into implementation from leverage checkpoint timestamp as versioning 
> and do range query to get current state; we also want to reduce "hot states" 
> hitting remote db per every update between adjacent checkpoints by tracking 
> update logs and merge them, do batch updates only when checkpoint; lastly, we 
> are looking for eviction policy that can identify "hot keys" in k/v state and 
> lazy load those "cold keys" from Cassandra.
> For now, we don't have good story regarding to data retirement. We might have 
> to keep forever until manually run command and clean per job related state 
> data. Some of features might related to incremental checkpointing feature, we 
> hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some 
> feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4266) Remote Database Statebackend

2016-08-05 Thread Chen Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-4266:

Summary: Remote Database Statebackend  (was: Remote Storage Statebackend)

> Remote Database Statebackend
> 
>
> Key: FLINK-4266
> URL: https://issues.apache.org/jira/browse/FLINK-4266
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.0.3, 1.2.0
>Reporter: Chen Qin
>Priority: Minor
>
> Current FileSystem statebackend limits whole state size to disk space. 
> Dealing with scale out checkpoint states beyond local disk space such as long 
> running task that hold window content for long period of time. Pipelines 
> needs to split out states to durable remote storage even replicated to 
> different data centers.
> We draft a design that leverage checkpoint id as mono incremental logic 
> timestamp and perform range query to get evicited state k/v. we also 
> introduce checkpoint time commit and eviction threshold that reduce "hot 
> states" hitting remote db per every update between adjacent checkpoints by 
> tracking update logs and merge them, do batch updates only when checkpoint; 
> lastly, we are looking for eviction policy that can identify "hot keys" in 
> k/v state and lazy load those "cold keys" from remote storage(e.g Cassandra).
> For now, we don't have good story regarding to data retirement. We might have 
> to keep forever until manually run command and clean per job related state 
> data. Some of features might related to incremental checkpointing feature, we 
> hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some 
> feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4266) Remote Storage Statebackend

2016-08-05 Thread Chen Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-4266:

Description: 
Current FileSystem statebackend limits whole state size to disk space. Dealing 
with scale out checkpoint states beyond local disk space such as long running 
task that hold window content for long period of time. Pipelines needs to split 
out states to durable remote storage even replicated to different data centers.

We draft a design that leverage checkpoint id as mono incremental logic 
timestamp and perform range query to get evicited state k/v. we also introduce 
checkpoint time commit and eviction threshold that reduce "hot states" hitting 
remote db per every update between adjacent checkpoints by tracking update logs 
and merge them, do batch updates only when checkpoint; lastly, we are looking 
for eviction policy that can identify "hot keys" in k/v state and lazy load 
those "cold keys" from remote storage(e.g Cassandra).

For now, we don't have good story regarding to data retirement. We might have 
to keep forever until manually run command and clean per job related state 
data. Some of features might related to incremental checkpointing feature, we 
hope to align with effort there also.

Welcome comments, I will try to put a draft design doc after gathering some 
feedback.







  was:
Current FileSystem statebackend limits whole state size to disk space. 
For long running task that hold window content for long period of time, it 
needs to split out states to durable remote storage and replicated across data 
centers.

We look into implementation from leverage checkpoint timestamp as versioning 
and do range query to get current state; we also want to reduce "hot states" 
hitting remote db per every update between adjacent checkpoints by tracking 
update logs and merge them, do batch updates only when checkpoint; lastly, we 
are looking for eviction policy that can identify "hot keys" in k/v state and 
lazy load those "cold keys" from Cassandra.

For now, we don't have good story regarding to data retirement. We might have 
to keep forever until manually run command and clean per job related state 
data. Some of features might related to incremental checkpointing feature, we 
hope to align with effort there also.

Welcome comments, I will try to put a draft design doc after gathering some 
feedback.








> Remote Storage Statebackend
> ---
>
> Key: FLINK-4266
> URL: https://issues.apache.org/jira/browse/FLINK-4266
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.0.3, 1.2.0
>Reporter: Chen Qin
>Priority: Minor
>
> Current FileSystem statebackend limits whole state size to disk space. 
> Dealing with scale out checkpoint states beyond local disk space such as long 
> running task that hold window content for long period of time. Pipelines 
> needs to split out states to durable remote storage even replicated to 
> different data centers.
> We draft a design that leverage checkpoint id as mono incremental logic 
> timestamp and perform range query to get evicited state k/v. we also 
> introduce checkpoint time commit and eviction threshold that reduce "hot 
> states" hitting remote db per every update between adjacent checkpoints by 
> tracking update logs and merge them, do batch updates only when checkpoint; 
> lastly, we are looking for eviction policy that can identify "hot keys" in 
> k/v state and lazy load those "cold keys" from remote storage(e.g Cassandra).
> For now, we don't have good story regarding to data retirement. We might have 
> to keep forever until manually run command and clean per job related state 
> data. Some of features might related to incremental checkpointing feature, we 
> hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some 
> feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1638) SimpleKafkaSource connector

2016-08-05 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410471#comment-15410471
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-1638:


Can we close this, as it seems to be resolved already in 
{{FlinkKafkaConsumer08}}?

> SimpleKafkaSource connector
> ---
>
> Key: FLINK-1638
> URL: https://issues.apache.org/jira/browse/FLINK-1638
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.9
>Reporter: Márton Balassi
>Assignee: Márton Balassi
>
> Support for Kafka's low level API connector. This is needed to support late 
> commits and replays.
> [1] 
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4266) Remote Database Statebackend

2016-08-05 Thread Chen Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410474#comment-15410474
 ] 

Chen Qin commented on FLINK-4266:
-

https://docs.google.com/document/d/1diHQyOPZVxgmnmYfiTa6glLf-FlFjSHcL8J3YR2xLdk/edit?usp=sharing


> Remote Database Statebackend
> 
>
> Key: FLINK-4266
> URL: https://issues.apache.org/jira/browse/FLINK-4266
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.0.3, 1.2.0
>Reporter: Chen Qin
>Priority: Minor
>
> Current FileSystem statebackend limits whole state size to disk space. 
> Dealing with scale out checkpoint states beyond local disk space such as long 
> running task that hold window content for long period of time. Pipelines 
> needs to split out states to durable remote storage even replicated to 
> different data centers.
> We draft a design that leverage checkpoint id as mono incremental logic 
> timestamp and perform range query to get evicited state k/v. we also 
> introduce checkpoint time commit and eviction threshold that reduce "hot 
> states" hitting remote db per every update between adjacent checkpoints by 
> tracking update logs and merge them, do batch updates only when checkpoint; 
> lastly, we are looking for eviction policy that can identify "hot keys" in 
> k/v state and lazy load those "cold keys" from remote storage(e.g Cassandra).
> For now, we don't have good story regarding to data retirement. We might have 
> to keep forever until manually run command and clean per job related state 
> data. Some of features might related to incremental checkpointing feature, we 
> hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some 
> feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3298) Streaming connector for ActiveMQ

2016-08-05 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410494#comment-15410494
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-3298:


Formally defining how we deal with new connector contributions (batch, 
streaming, state-backends) is a long overdue task for Flink. As Flink is 
growing in popularity, we can expect more of these contributions to come, and 
it's important that we clearly define a decision process to provide better 
experience for new contributors working on them.

Some initial thoughts:

 1. For every new connector contribution, we should have a set of criteria to 
decide how we should deal with them. The final decision can either be "link on 
Flink's community page", "collect in flink-connectors/flink-contrib", or "core 
flink repository".

2. The initial discussion can happen on the dev mailing lists, and only when 
the decision is voted and made by core committers, should a corresponding JIRA 
be opened and let contributors pick them up (connector contributions which 
haven't gone through this process are to be rejected). I think the questions 
Stephan asked can be a good starting point for the deciding criteria.

Currently we also have a few other JIRAs for other connectors (some with PRs 
already). We'd probably need to go through them before proceeding with them 
(some of them already have discussions on how they should be dealt with).

> Streaming connector for ActiveMQ
> 
>
> Key: FLINK-3298
> URL: https://issues.apache.org/jira/browse/FLINK-3298
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Mohit Sethi
>Assignee: Ivan Mushketyk
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3298) Streaming connector for ActiveMQ

2016-08-05 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410496#comment-15410496
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-3298:


Formally defining how we deal with new connector contributions (batch, 
streaming, state-backends) is a long overdue task for Flink. As Flink is 
growing in popularity, we can expect more of these contributions to come, and 
it's important that we clearly define a decision process to provide better 
experience for new contributors working on them.

Some initial thoughts:

 1. For every new connector contribution, we should have a set of criteria to 
decide how we should deal with them. The final decision can either be "link on 
Flink's community page", "collect in flink-connectors/flink-contrib", or "core 
flink repository".

2. The initial discussion can happen on the dev mailing lists, and only when 
the decision is voted and made by core committers, should a corresponding JIRA 
be opened and let contributors pick them up (connector contributions which 
haven't gone through this process are to be rejected). I think the questions 
Stephan asked can be a good starting point for the deciding criteria.

Currently we also have a few other JIRAs for other connectors (some with PRs 
already). We'd probably need to go through them before proceeding with them 
(some of them already have discussions on how they should be dealt with).

> Streaming connector for ActiveMQ
> 
>
> Key: FLINK-3298
> URL: https://issues.apache.org/jira/browse/FLINK-3298
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Mohit Sethi
>Assignee: Ivan Mushketyk
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (FLINK-3298) Streaming connector for ActiveMQ

2016-08-05 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-3298:
---
Comment: was deleted

(was: Formally defining how we deal with new connector contributions (batch, 
streaming, state-backends) is a long overdue task for Flink. As Flink is 
growing in popularity, we can expect more of these contributions to come, and 
it's important that we clearly define a decision process to provide better 
experience for new contributors working on them.

Some initial thoughts:

 1. For every new connector contribution, we should have a set of criteria to 
decide how we should deal with them. The final decision can either be "link on 
Flink's community page", "collect in flink-connectors/flink-contrib", or "core 
flink repository".

2. The initial discussion can happen on the dev mailing lists, and only when 
the decision is voted and made by core committers, should a corresponding JIRA 
be opened and let contributors pick them up (connector contributions which 
haven't gone through this process are to be rejected). I think the questions 
Stephan asked can be a good starting point for the deciding criteria.

Currently we also have a few other JIRAs for other connectors (some with PRs 
already). We'd probably need to go through them before proceeding with them 
(some of them already have discussions on how they should be dealt with).)

> Streaming connector for ActiveMQ
> 
>
> Key: FLINK-3298
> URL: https://issues.apache.org/jira/browse/FLINK-3298
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Mohit Sethi
>Assignee: Ivan Mushketyk
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2