[jira] [Updated] (FLINK-20041) Support Watermark push down for kafka connector

2020-11-06 Thread Shengkai Fang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengkai Fang updated FLINK-20041:
--
Parent: FLINK-16987
Issue Type: Sub-task  (was: New Feature)

> Support Watermark push down for kafka connector
> ---
>
> Key: FLINK-20041
> URL: https://issues.apache.org/jira/browse/FLINK-20041
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka, Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Shengkai Fang
>Priority: Major
>
> Support watermark push down for kafka connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   * f9c06b641c424a4e3f4e91ecc668b599dd45b6e1 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9240)
 
   * 41b8b8583a8b8e08979515d6bcb1475fdd231664 UNKNOWN
   * e71978fc6c6da487d3b75da6a3fa8b86d6bd2f2e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9242)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20041) Support Watermark push down for kafka connector

2020-11-06 Thread Shengkai Fang (Jira)
Shengkai Fang created FLINK-20041:
-

 Summary: Support Watermark push down for kafka connector
 Key: FLINK-20041
 URL: https://issues.apache.org/jira/browse/FLINK-20041
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Kafka, Table SQL / API
Affects Versions: 1.12.0
Reporter: Shengkai Fang


Support watermark push down for kafka connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13296: [FLINK-18774][format][debezium] Support debezium-avro format

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13296:
URL: https://github.com/apache/flink/pull/13296#issuecomment-684611193


   
   ## CI report:
   
   * c266341c63cc4cdc28a6585a82294710fd8ebbc5 UNKNOWN
   * c3b8c47d4aac9dbc1b425bdef346e084febd4a2d UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-10240) Pluggable scheduling strategy for batch jobs

2020-11-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-10240.
---
Resolution: Later

> Pluggable scheduling strategy for batch jobs
> 
>
> Key: FLINK-10240
> URL: https://issues.apache.org/jira/browse/FLINK-10240
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Priority: Major
>  Labels: scheduling
>
> Currently batch jobs are scheduled with LAZY_FROM_SOURCES strategy: source 
> tasks are scheduled in the beginning, and other tasks are scheduled once 
> their input data is consumable.
> However, input data consumable does not always mean the task can work at 
> once. 
>  
> One example is the hash join operation, where the operator first consumes one 
> side(we call it build side) to setup a table, then consumes the other side(we 
> call it probe side) to do the real join work. If the probe side is started 
> early, it just get stuck on back pressure as the join operator will not 
> consume data from it before the building stage is done, causing a waste of 
> resources.
> If we have the probe side task started after the build stage is done, both 
> the build and probe side can have more computing resources as they are 
> staggered.
>  
> That's why we think a flexible scheduling strategy is needed, allowing job 
> owners to customize the vertex schedule order and constraints. Better 
> resource utilization usually means better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-10240) Pluggable scheduling strategy for batch jobs

2020-11-06 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227751#comment-17227751
 ] 

Zhu Zhu commented on FLINK-10240:
-

Let's close this ticket because the pluggable scheduling strategy it proposed 
is almost done in FLINK-10429.
It's not fully done though but we can open separate tickets to further improve 
it, after the scheduler code has reached a more stable state.

> Pluggable scheduling strategy for batch jobs
> 
>
> Key: FLINK-10240
> URL: https://issues.apache.org/jira/browse/FLINK-10240
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Priority: Major
>  Labels: scheduling
>
> Currently batch jobs are scheduled with LAZY_FROM_SOURCES strategy: source 
> tasks are scheduled in the beginning, and other tasks are scheduled once 
> their input data is consumable.
> However, input data consumable does not always mean the task can work at 
> once. 
>  
> One example is the hash join operation, where the operator first consumes one 
> side(we call it build side) to setup a table, then consumes the other side(we 
> call it probe side) to do the real join work. If the probe side is started 
> early, it just get stuck on back pressure as the join operator will not 
> consume data from it before the building stage is done, causing a waste of 
> resources.
> If we have the probe side task started after the build stage is done, both 
> the build and probe side can have more computing resources as they are 
> staggered.
>  
> That's why we think a flexible scheduling strategy is needed, allowing job 
> owners to customize the vertex schedule order and constraints. Better 
> resource utilization usually means better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19963) Give SinkWriter access to processing-time service

2020-11-06 Thread Aljoscha Krettek (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-19963:
-
Description: Some sinks, like the new unified {{FileSink}} will need 
low-level processing-time timers to determine Bucket inactivity, among other 
things.  (was: Some `Sink` needs to register `TimeService`, for example 
`StreamingFileWriter`

So this pr exposes the `TimeService` to the `SinkWriter`.)

> Give SinkWriter access to processing-time service
> -
>
> Key: FLINK-19963
> URL: https://issues.apache.org/jira/browse/FLINK-19963
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: Guowei Ma
>Priority: Major
>  Labels: pull-request-available
>
> Some sinks, like the new unified {{FileSink}} will need low-level 
> processing-time timers to determine Bucket inactivity, among other things.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19963) Give SinkWriter access to processing-time service

2020-11-06 Thread Aljoscha Krettek (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-19963:
-
Summary: Give SinkWriter access to processing-time service  (was: Let the 
`SinkWriter` support using the `TimerService`)

> Give SinkWriter access to processing-time service
> -
>
> Key: FLINK-19963
> URL: https://issues.apache.org/jira/browse/FLINK-19963
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: Guowei Ma
>Priority: Major
>  Labels: pull-request-available
>
> Some `Sink` needs to register `TimeService`, for example `StreamingFileWriter`
> So this pr exposes the `TimeService` to the `SinkWriter`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20031) Keep the UID of SinkWriter same as the SinkTransformation

2020-11-06 Thread Aljoscha Krettek (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-20031:
-
Summary: Keep the UID of SinkWriter same as the SinkTransformation  (was: 
Keep the uid of SinkWriter same as the SinkTransformation)

> Keep the UID of SinkWriter same as the SinkTransformation
> -
>
> Key: FLINK-20031
> URL: https://issues.apache.org/jira/browse/FLINK-20031
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: Guowei Ma
>Priority: Major
>  Labels: pull-request-available
>
> In this case that we want to migrate the StreamingFileSink to the new sink 
> api we might need to let user set the SinkWriter's uid same as the 
> StreamingFileSink's. So that SinkWriter operator has the opportunity to reuse 
> the old state. (This is just a option.)
>  
> For this we need to let SinkWriter operator's uid is the same as the 
> SinkTransformation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13958:
URL: https://github.com/apache/flink/pull/13958#issuecomment-722993305


   
   ## CI report:
   
   * e6c7892e55dd4baaff4968e4257a63a51023d841 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9191)
 
   * c1b246f5f8538f9bf313c9456a04aac3fdf54b67 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9244)
 
   * 964580e303ef47cb868c3cfe5e3d0fb855c0c2a3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13931: [FLINK-19811][table-planner] Introduce RexSimplify to simplify SEARCH conditions

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13931:
URL: https://github.com/apache/flink/pull/13931#issuecomment-722165982


   
   ## CI report:
   
   * 1bfce4d4cf3ad25b18bbcc654fcf89837fbefcbf Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9233)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-719924065


   
   ## CI report:
   
   * 8afe5fbf796d216f655dd6e7ad7585f328196f54 UNKNOWN
   * a27c2c7198ec1cbaf7ad5d84c5d0562033b60e4e Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9232)
 
   * 11f723158926cc5ddde4c3f80c0fde4f8b0e5910 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9238)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13296: [FLINK-18774][format][debezium] Support debezium-avro format

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13296:
URL: https://github.com/apache/flink/pull/13296#issuecomment-684611193


   
   ## CI report:
   
   * c266341c63cc4cdc28a6585a82294710fd8ebbc5 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * c3b8c47d4aac9dbc1b425bdef346e084febd4a2d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] caozhen1937 commented on pull request #13296: [FLINK-18774][format][debezium] Support debezium-avro format

2020-11-06 Thread GitBox


caozhen1937 commented on pull request #13296:
URL: https://github.com/apache/flink/pull/13296#issuecomment-723402849


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


zhuzhurk commented on a change in pull request #13958:
URL: https://github.com/apache/flink/pull/13958#discussion_r519145694



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopology.java
##
@@ -91,6 +88,8 @@ public DefaultExecutionTopology(ExecutionGraph graph) {
this.pipelinedRegionsByVertex = new HashMap<>();
this.pipelinedRegions = new ArrayList<>();
initializePipelinedRegions();
+
+   ensureCoLocatedVerticesInSameRegion();

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13958:
URL: https://github.com/apache/flink/pull/13958#issuecomment-722993305


   
   ## CI report:
   
   * e6c7892e55dd4baaff4968e4257a63a51023d841 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9191)
 
   * c1b246f5f8538f9bf313c9456a04aac3fdf54b67 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13974: [hotfix][doc] Fix a concurrent issue in testing.md

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13974:
URL: https://github.com/apache/flink/pull/13974#issuecomment-723401121


   
   ## CI report:
   
   * 0073a428d1b41297708cbd146482e1d8509174cd Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9243)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   * f9c06b641c424a4e3f4e91ecc668b599dd45b6e1 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9240)
 
   * 41b8b8583a8b8e08979515d6bcb1475fdd231664 UNKNOWN
   * e71978fc6c6da487d3b75da6a3fa8b86d6bd2f2e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9242)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


zhuzhurk commented on a change in pull request #13958:
URL: https://github.com/apache/flink/pull/13958#discussion_r519143747



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopology.java
##
@@ -91,6 +88,8 @@ public DefaultExecutionTopology(ExecutionGraph graph) {
this.pipelinedRegionsByVertex = new HashMap<>();
this.pipelinedRegions = new ArrayList<>();
initializePipelinedRegions();
+
+   ensureCoLocatedVerticesInSameRegion();

Review comment:
   Yes I agree that publishing `this` in the constructor is not a good idea.
   Will add a separate commit to fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13974: [hotfix][doc] Fix a concurrent issue in testing.md

2020-11-06 Thread GitBox


flinkbot commented on pull request #13974:
URL: https://github.com/apache/flink/pull/13974#issuecomment-723401121


   
   ## CI report:
   
   * 0073a428d1b41297708cbd146482e1d8509174cd UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   * f9c06b641c424a4e3f4e91ecc668b599dd45b6e1 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9240)
 
   * 41b8b8583a8b8e08979515d6bcb1475fdd231664 UNKNOWN
   * e71978fc6c6da487d3b75da6a3fa8b86d6bd2f2e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


zhuzhurk commented on a change in pull request #13958:
URL: https://github.com/apache/flink/pull/13958#discussion_r519129436



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopology.java
##
@@ -201,4 +195,32 @@ private static void connectVerticesToConsumedPartitions(
}
}
}
+
+   /**
+* Co-location constraints are only used for iteration head and tail.
+* A paired head and tail needs to be in the same pipelined region so
+* that they can be restarted together.
+*/
+   private void ensureCoLocatedVerticesInSameRegion() {
+   final Map constraintToRegion = new IdentityHashMap<>();

Review comment:
   Dealing with `null` constraint is a bug here and have resulted in lots 
of test failures. I have fixed it.
   The test was incomplete but I forget about it because it passed due to the 
`null` constraint. I will fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


zhuzhurk commented on a change in pull request #13958:
URL: https://github.com/apache/flink/pull/13958#discussion_r519133383



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopology.java
##
@@ -201,4 +195,32 @@ private static void connectVerticesToConsumedPartitions(
}
}
}
+
+   /**
+* Co-location constraints are only used for iteration head and tail.
+* A paired head and tail needs to be in the same pipelined region so
+* that they can be restarted together.
+*/
+   private void ensureCoLocatedVerticesInSameRegion() {
+   final Map constraintToRegion = new IdentityHashMap<>();

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #13958: [FLINK-19994][runtime] Do not force iteration job to generate one only pipelined region

2020-11-06 Thread GitBox


zhuzhurk commented on a change in pull request #13958:
URL: https://github.com/apache/flink/pull/13958#discussion_r519129436



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopology.java
##
@@ -201,4 +195,32 @@ private static void connectVerticesToConsumedPartitions(
}
}
}
+
+   /**
+* Co-location constraints are only used for iteration head and tail.
+* A paired head and tail needs to be in the same pipelined region so
+* that they can be restarted together.
+*/
+   private void ensureCoLocatedVerticesInSameRegion() {
+   final Map constraintToRegion = new IdentityHashMap<>();

Review comment:
   Nullable constraint is a bug here and have resulted in lots of test 
failures. I have fixed it.
   The test was incomplete but I forget about it because it passed due to the 
nullable constraint bug. I will fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13974: [hotfix][doc] Fix a concurrent issue in testing.md

2020-11-06 Thread GitBox


flinkbot commented on pull request #13974:
URL: https://github.com/apache/flink/pull/13974#issuecomment-723399501


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 0073a428d1b41297708cbd146482e1d8509174cd (Sat Nov 07 
06:26:06 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19644) Support read specific partition of Hive table in temporal join

2020-11-06 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-19644.

Resolution: Fixed

master (1.12): 09386f27b66c86e8148ae8d25d72e3c2ac552362

> Support read specific partition of Hive table in temporal join
> --
>
> Key: FLINK-19644
> URL: https://issues.apache.org/jira/browse/FLINK-19644
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Reporter: Leonard Xu
>Assignee: Leonard Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> It's a common case to use hive partitioned table as dimension table.
> Currently Hive table only supports load all data, It will be helpful if we 
> can support  read user specific partition in temporal table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] xccui opened a new pull request #13974: [hotfix][doc] Fix a concurrent issue in testing.md

2020-11-06 Thread GitBox


xccui opened a new pull request #13974:
URL: https://github.com/apache/flink/pull/13974


   ## What is the purpose of the change
   
   An example in `testing.md` uses `synchronized` keyword to protect a static 
variable, which will cause concurrent issues with multiple sink instances. 
   
   ## Brief change log
   
   Use `Collections.synchronizedList()` to create a thread-safe list and remove 
the old `synchronized` keyword.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   This change added tests and can be verified as follows:
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi merged pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


JingsongLi merged pull request #13729:
URL: https://github.com/apache/flink/pull/13729


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   * f9c06b641c424a4e3f4e91ecc668b599dd45b6e1 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9240)
 
   * 41b8b8583a8b8e08979515d6bcb1475fdd231664 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi commented on a change in pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


JingsongLi commented on a change in pull request #13957:
URL: https://github.com/apache/flink/pull/13957#discussion_r519122574



##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemTableSink.java
##
@@ -251,6 +258,14 @@ private RowDataPartitionComputer partitionComputer() {
//noinspection unchecked
return 
Optional.of(FileInputFormatCompactReader.factory((FileInputFormat) 
format));
}
+   } else if (deserializationFormat != null) {
+   // NOTE, we need pass full format types to 
deserializationFormat
+   DeserializationSchema decoder = 
deserializationFormat.createRuntimeDecoder(
+   createSourceContext(context), 
getFormatDataType());
+   int[] projectedFields = IntStream.of(0, 
schema.getFieldCount()).toArray();

Review comment:
   I'll add Json Compaction test





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi commented on a change in pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


JingsongLi commented on a change in pull request #13957:
URL: https://github.com/apache/flink/pull/13957#discussion_r519122722



##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/DeserializationSchemaAdapter.java
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.filesystem;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.serialization.DeserializationSchema;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.FileSourceSplit;
+import org.apache.flink.connector.file.src.reader.BulkFormat;
+import org.apache.flink.connector.file.src.util.ArrayResultIterator;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.runtime.typeutils.InternalTypeInfo;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.utils.PartitionPathUtils;
+import org.apache.flink.util.Collector;
+import org.apache.flink.util.InstantiationUtil;
+import org.apache.flink.util.UserCodeClassLoader;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Queue;
+import java.util.stream.Collectors;
+
+import static 
org.apache.flink.connector.file.src.util.CheckpointedPosition.NO_OFFSET;
+import static 
org.apache.flink.table.data.vector.VectorizedColumnBatch.DEFAULT_SIZE;
+
+/**
+ * Adapter to turn a {@link DeserializationSchema} into a {@link BulkFormat}.
+ */
+public class DeserializationSchemaAdapter implements BulkFormat {
+
+   private static final int BATCH_SIZE = 100;
+
+   // NOTE, deserializationSchema produce full format fields with original 
order
+   private final DeserializationSchema deserializationSchema;
+
+   private final String[] fieldNames;
+   private final DataType[] fieldTypes;
+   private final int[] projectFields;
+   private final RowType projectedRowType;
+
+   private final List partitionKeys;
+   private final String defaultPartValue;
+
+   private final int[] toProjectedField;
+   private final RowData.FieldGetter[] formatFieldGetters;
+
+   public DeserializationSchemaAdapter(
+   DeserializationSchema deserializationSchema,
+   TableSchema schema,
+   int[] projectFields,
+   List partitionKeys,
+   String defaultPartValue) {
+   this.deserializationSchema = deserializationSchema;
+   this.fieldNames = schema.getFieldNames();
+   this.fieldTypes = schema.getFieldDataTypes();
+   this.projectFields = projectFields;
+   this.partitionKeys = partitionKeys;
+   this.defaultPartValue = defaultPartValue;
+
+   List projectedNames = Arrays.stream(projectFields)
+   .mapToObj(idx -> schema.getFieldNames()[idx])
+   .collect(Collectors.toList());
+
+   this.projectedRowType = RowType.of(
+   Arrays.stream(projectFields).mapToObj(idx ->
+   
schema.getFieldDataTypes()[idx].getLogicalType()).toArray(LogicalType[]::new),
+   projectedNames.toArray(new String[0]));
+
+   List formatFields = 
Arrays.stream(schema.getFieldNames())
+   .filter(field -> !partitionKeys.contains(field))
+   .collect(Collectors.toList());
+
+   List formatProjectedFields = projectedNames.stream()
+   .filter(field -> 

[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   * f9c06b641c424a4e3f4e91ecc668b599dd45b6e1 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9240)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi commented on a change in pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


JingsongLi commented on a change in pull request #13957:
URL: https://github.com/apache/flink/pull/13957#discussion_r519119368



##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/DeserializationSchemaAdapter.java
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.filesystem;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.serialization.DeserializationSchema;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.FileSourceSplit;
+import org.apache.flink.connector.file.src.reader.BulkFormat;
+import org.apache.flink.connector.file.src.util.ArrayResultIterator;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.runtime.typeutils.InternalTypeInfo;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.utils.PartitionPathUtils;
+import org.apache.flink.util.Collector;
+import org.apache.flink.util.InstantiationUtil;
+import org.apache.flink.util.UserCodeClassLoader;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Queue;
+import java.util.stream.Collectors;
+
+import static 
org.apache.flink.connector.file.src.util.CheckpointedPosition.NO_OFFSET;
+import static 
org.apache.flink.table.data.vector.VectorizedColumnBatch.DEFAULT_SIZE;
+
+/**
+ * Adapter to turn a {@link DeserializationSchema} into a {@link BulkFormat}.
+ */
+public class DeserializationSchemaAdapter implements BulkFormat {
+
+   private static final int BATCH_SIZE = 100;
+
+   // NOTE, deserializationSchema produce full format fields with original 
order
+   private final DeserializationSchema deserializationSchema;
+
+   private final String[] fieldNames;
+   private final DataType[] fieldTypes;
+   private final int[] projectFields;
+   private final RowType projectedRowType;
+
+   private final List partitionKeys;
+   private final String defaultPartValue;
+
+   private final int[] toProjectedField;
+   private final RowData.FieldGetter[] formatFieldGetters;
+
+   public DeserializationSchemaAdapter(
+   DeserializationSchema deserializationSchema,
+   TableSchema schema,
+   int[] projectFields,
+   List partitionKeys,
+   String defaultPartValue) {
+   this.deserializationSchema = deserializationSchema;
+   this.fieldNames = schema.getFieldNames();
+   this.fieldTypes = schema.getFieldDataTypes();
+   this.projectFields = projectFields;
+   this.partitionKeys = partitionKeys;
+   this.defaultPartValue = defaultPartValue;
+
+   List projectedNames = Arrays.stream(projectFields)
+   .mapToObj(idx -> schema.getFieldNames()[idx])
+   .collect(Collectors.toList());
+
+   this.projectedRowType = RowType.of(
+   Arrays.stream(projectFields).mapToObj(idx ->
+   
schema.getFieldDataTypes()[idx].getLogicalType()).toArray(LogicalType[]::new),
+   projectedNames.toArray(new String[0]));
+
+   List formatFields = 
Arrays.stream(schema.getFieldNames())
+   .filter(field -> !partitionKeys.contains(field))
+   .collect(Collectors.toList());
+
+   List formatProjectedFields = projectedNames.stream()
+   .filter(field -> 

[jira] [Commented] (FLINK-19653) HiveCatalogITCase fails on azure

2020-11-06 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227733#comment-17227733
 ] 

Dian Fu commented on FLINK-19653:
-

HiveTableSourceITCase failed with similar exception:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9221=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf

> HiveCatalogITCase fails on azure
> 
>
> Key: FLINK-19653
> URL: https://issues.apache.org/jira/browse/FLINK-19653
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.12.0
>Reporter: Yun Gao
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=7628=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf
> {code:java}
> 2020-10-14T17:28:27.3065932Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 10.396 s <<< FAILURE! - in 
> org.apache.flink.table.catalog.hive.HiveCatalogITCase
> 2020-10-14T17:28:27.3066739Z [ERROR] 
> org.apache.flink.table.catalog.hive.HiveCatalogITCase  Time elapsed: 10.396 s 
>  <<< ERROR!
> 2020-10-14T17:28:27.3067248Z java.lang.IllegalStateException: Failed to 
> create HiveServer :Failed to get metastore connection
> 2020-10-14T17:28:27.3067925Z  at 
> com.klarna.hiverunner.HiveServerContainer.init(HiveServerContainer.java:101)
> 2020-10-14T17:28:27.3068360Z  at 
> com.klarna.hiverunner.builder.HiveShellBase.start(HiveShellBase.java:165)
> 2020-10-14T17:28:27.3068886Z  at 
> org.apache.flink.connectors.hive.FlinkStandaloneHiveRunner.createHiveServerContainer(FlinkStandaloneHiveRunner.java:217)
> 2020-10-14T17:28:27.3069678Z  at 
> org.apache.flink.connectors.hive.FlinkStandaloneHiveRunner.access$600(FlinkStandaloneHiveRunner.java:92)
> 2020-10-14T17:28:27.3070290Z  at 
> org.apache.flink.connectors.hive.FlinkStandaloneHiveRunner$2.before(FlinkStandaloneHiveRunner.java:131)
> 2020-10-14T17:28:27.3070763Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
> 2020-10-14T17:28:27.3071177Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-10-14T17:28:27.3071576Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2020-10-14T17:28:27.3071961Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2020-10-14T17:28:27.3072432Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-10-14T17:28:27.3072852Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-10-14T17:28:27.3073316Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-10-14T17:28:27.3073810Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-10-14T17:28:27.3074287Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-10-14T17:28:27.3074768Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-10-14T17:28:27.3075281Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-10-14T17:28:27.3075798Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2020-10-14T17:28:27.3076239Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2020-10-14T17:28:27.3076648Z Caused by: java.lang.RuntimeException: Failed to 
> get metastore connection
> 2020-10-14T17:28:27.3077099Z  at 
> org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:169)
> 2020-10-14T17:28:27.3077650Z  at 
> com.klarna.hiverunner.HiveServerContainer.init(HiveServerContainer.java:84)
> 2020-10-14T17:28:27.3077947Z  ... 17 more
> 2020-10-14T17:28:27.3078655Z Caused by: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> 2020-10-14T17:28:27.3079236Z  at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:236)
> 2020-10-14T17:28:27.3079655Z  at 
> org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:388)
> 2020-10-14T17:28:27.3080038Z  at 
> org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
> 2020-10-14T17:28:27.3080610Z  at 
> org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
> 2020-10-14T17:28:27.3081099Z  at 
> org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
> 2020-10-14T17:28:27.3081501Z  at 
> org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:166)
> 2020-10-14T17:28:27.3081784Z  ... 18 more
> 2020-10-14T17:28:27.3082140Z Caused by: java.lang.RuntimeException: Unable to 
> instantiate 

[GitHub] [flink] wuchong commented on pull request #13296: [FLINK-18774][format][debezium] Support debezium-avro format

2020-11-06 Thread GitBox


wuchong commented on pull request #13296:
URL: https://github.com/apache/flink/pull/13296#issuecomment-723396648


   Hi @caozhen1937 , the compile CI is failed. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   * f9c06b641c424a4e3f4e91ecc668b599dd45b6e1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19868) Csv Serialization schema contains line delimiter

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-19868:

Release Note: The 'csv.line-delimiter' option has been removed from CSV 
format. Because the line delimiter should be defined by the connector instead 
of format. If users have been using this option in previous Flink version, they 
should alter such table to remove this option when upgrading to Flink 1.12. 
There should not much users using this option. 

> Csv Serialization schema contains line delimiter
> 
>
> Key: FLINK-19868
> URL: https://issues.apache.org/jira/browse/FLINK-19868
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Jingsong Lee
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> CsvRowSerializationSchema.serialize(Row.of("f0", "f1")) => f0,f1\n
> Csv Serialization schema is for one line, why contains line delimiter?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on a change in pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


wuchong commented on a change in pull request #13957:
URL: https://github.com/apache/flink/pull/13957#discussion_r519108294



##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemTableSink.java
##
@@ -251,6 +258,14 @@ private RowDataPartitionComputer partitionComputer() {
//noinspection unchecked
return 
Optional.of(FileInputFormatCompactReader.factory((FileInputFormat) 
format));
}
+   } else if (deserializationFormat != null) {
+   // NOTE, we need pass full format types to 
deserializationFormat
+   DeserializationSchema decoder = 
deserializationFormat.createRuntimeDecoder(
+   createSourceContext(context), 
getFormatDataType());
+   int[] projectedFields = IntStream.of(0, 
schema.getFieldCount()).toArray();

Review comment:
   `IntStream.range`?

##
File path: 
flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/DeserializationSchemaAdapter.java
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.filesystem;
+
+import org.apache.flink.api.common.io.DelimitedInputFormat;
+import org.apache.flink.api.common.serialization.DeserializationSchema;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.FileSourceSplit;
+import org.apache.flink.connector.file.src.reader.BulkFormat;
+import org.apache.flink.connector.file.src.util.ArrayResultIterator;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.runtime.typeutils.InternalTypeInfo;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.utils.PartitionPathUtils;
+import org.apache.flink.util.Collector;
+import org.apache.flink.util.InstantiationUtil;
+import org.apache.flink.util.UserCodeClassLoader;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Queue;
+import java.util.stream.Collectors;
+
+import static 
org.apache.flink.connector.file.src.util.CheckpointedPosition.NO_OFFSET;
+import static 
org.apache.flink.table.data.vector.VectorizedColumnBatch.DEFAULT_SIZE;
+
+/**
+ * Adapter to turn a {@link DeserializationSchema} into a {@link BulkFormat}.
+ */
+public class DeserializationSchemaAdapter implements BulkFormat {
+
+   private static final int BATCH_SIZE = 100;
+
+   // NOTE, deserializationSchema produce full format fields with original 
order
+   private final DeserializationSchema deserializationSchema;
+
+   private final String[] fieldNames;
+   private final DataType[] fieldTypes;
+   private final int[] projectFields;
+   private final RowType projectedRowType;
+
+   private final List partitionKeys;
+   private final String defaultPartValue;
+
+   private final int[] toProjectedField;
+   private final RowData.FieldGetter[] formatFieldGetters;
+
+   public DeserializationSchemaAdapter(
+   DeserializationSchema deserializationSchema,
+   TableSchema schema,
+   int[] projectFields,
+   List partitionKeys,
+   String defaultPartValue) {
+   this.deserializationSchema = deserializationSchema;
+   this.fieldNames = schema.getFieldNames();
+   this.fieldTypes = schema.getFieldDataTypes();
+   this.projectFields = projectFields;
+   this.partitionKeys = partitionKeys;
+   this.defaultPartValue 

[GitHub] [flink] flinkbot edited a comment on pull request #13871: [FLINK-19544][k8s] Implement CheckpointRecoveryFactory based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13871:
URL: https://github.com/apache/flink/pull/13871#issuecomment-720254759


   
   ## CI report:
   
   * ab63312aba03160fffba96646c4f39257dc7f760 UNKNOWN
   * 179472a87778a738c1f9f866156dcf6e024e69c7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9202)
 
   * 98a7d79ea52247e17fb1374f40de32109d56d63c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9239)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-719924065


   
   ## CI report:
   
   * 3c1eac765ebab2b74ade4ccbcf576e496e4bdffc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9174)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9201)
 
   * 8afe5fbf796d216f655dd6e7ad7585f328196f54 UNKNOWN
   * a27c2c7198ec1cbaf7ad5d84c5d0562033b60e4e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9232)
 
   * 11f723158926cc5ddde4c3f80c0fde4f8b0e5910 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9238)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13729:
URL: https://github.com/apache/flink/pull/13729#issuecomment-713673157


   
   ## CI report:
   
   * aefdb5f32451d2a47b8cd57b46bc559ccde5b797 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9149)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9230)
 
   * 4a41ac978fceda679c3cd3f09a26011ed8d16029 UNKNOWN
   * 924b755349e6ef14f84d26a83a9addb4dd9dc6c6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9237)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13871: [FLINK-19544][k8s] Implement CheckpointRecoveryFactory based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13871:
URL: https://github.com/apache/flink/pull/13871#issuecomment-720254759


   
   ## CI report:
   
   * ab63312aba03160fffba96646c4f39257dc7f760 UNKNOWN
   * 179472a87778a738c1f9f866156dcf6e024e69c7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9202)
 
   * 98a7d79ea52247e17fb1374f40de32109d56d63c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-719924065


   
   ## CI report:
   
   * 3c1eac765ebab2b74ade4ccbcf576e496e4bdffc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9174)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9201)
 
   * 8afe5fbf796d216f655dd6e7ad7585f328196f54 UNKNOWN
   * a27c2c7198ec1cbaf7ad5d84c5d0562033b60e4e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9232)
 
   * 11f723158926cc5ddde4c3f80c0fde4f8b0e5910 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13729:
URL: https://github.com/apache/flink/pull/13729#issuecomment-713673157


   
   ## CI report:
   
   * aefdb5f32451d2a47b8cd57b46bc559ccde5b797 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9149)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9230)
 
   * 4a41ac978fceda679c3cd3f09a26011ed8d16029 UNKNOWN
   * 924b755349e6ef14f84d26a83a9addb4dd9dc6c6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20028) FileCompactionITCase is unstable

2020-11-06 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227723#comment-17227723
 ] 

Jingsong Lee commented on FLINK-20028:
--

Looks like hang in the source, let's keep looking.

> FileCompactionITCase is unstable
> 
>
> Key: FLINK-20028
> URL: https://issues.apache.org/jira/browse/FLINK-20028
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Table SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Stephan Ewen
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> The {{ParquetFileCompactionITCase}} hangs and times out.
> Log: 
> https://dev.azure.com/sewen0794/Flink/_build/results?buildId=178=logs=66592496-52df-56bb-d03e-37509e1d9d0f=ae0269db-6796-5583-2e5f-d84757d711aa
> Exception:
> {code}
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:119)
>   at 
> org.apache.flink.table.pi.internal.TableResultImpl.await(TableResultImpl.java:86)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.FileCompactionITCaseBase.testNonPartition(FileCompactionITCaseBase.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] JingsongLi commented on a change in pull request #13937: [FLINK-19886][hive] Integrate file compaction to Hive connector

2020-11-06 Thread GitBox


JingsongLi commented on a change in pull request #13937:
URL: https://github.com/apache/flink/pull/13937#discussion_r519097384



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/read/HiveCompactReader.java
##
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connectors.hive.read;
+
+import org.apache.flink.connector.file.src.reader.BulkFormat;
+import org.apache.flink.connectors.hive.HiveTablePartition;
+import org.apache.flink.connectors.hive.JobConfWrapper;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.hive.client.HiveShim;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.filesystem.stream.compact.CompactBulkReader;
+import org.apache.flink.table.filesystem.stream.compact.CompactContext;
+import org.apache.flink.table.filesystem.stream.compact.CompactReader;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.RowType;
+
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.mapred.JobConf;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static 
org.apache.flink.connectors.hive.util.HivePartitionUtils.restorePartitionValueFromType;
+import static 
org.apache.flink.table.utils.PartitionPathUtils.extractPartitionSpecFromPath;
+
+/**
+ * The {@link CompactReader} to delegate hive bulk format.
+ */
+public class HiveCompactReader extends CompactBulkReader {
+
+   private HiveCompactReader(BulkFormat.Reader reader) throws 
IOException {
+   super(reader);
+   }
+
+   public static CompactReader.Factory factory(
+   StorageDescriptor sd,
+   Properties properties,
+   JobConf jobConf,
+   CatalogTable catalogTable,
+   String hiveVersion,
+   RowType producedRowType,
+   boolean useMapRedReader) {
+   return new Factory(
+   sd,
+   properties,
+   new JobConfWrapper(jobConf),
+   catalogTable.getPartitionKeys(),
+   catalogTable.getSchema().getFieldNames(),
+   catalogTable.getSchema().getFieldDataTypes(),
+   hiveVersion,
+   producedRowType,
+   useMapRedReader);
+   }
+
+   /**
+* Factory to create {@link HiveCompactReader}.
+*/
+   private static class Factory implements CompactReader.Factory {
+
+   private static final long serialVersionUID = 1L;
+
+   private final StorageDescriptor sd;
+   private final Properties properties;
+   private final JobConfWrapper jobConfWrapper;
+   private final List partitionKeys;
+   private final String[] fieldNames;
+   private final DataType[] fieldTypes;
+   private final String hiveVersion;
+   private final HiveShim shim;
+   private final RowType producedRowType;
+   private final boolean useMapRedReader;
+
+   private Factory(
+   StorageDescriptor sd,
+   Properties properties,
+   JobConfWrapper jobConfWrapper,
+   List partitionKeys,
+   String[] fieldNames,
+   DataType[] fieldTypes,
+   String hiveVersion,
+   RowType producedRowType,
+   boolean useMapRedReader) {
+   this.sd = sd;
+   

[jira] [Created] (FLINK-20040) Hive streaming sink does not support static partition inserting well

2020-11-06 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-20040:


 Summary: Hive streaming sink does not support static partition 
inserting well
 Key: FLINK-20040
 URL: https://issues.apache.org/jira/browse/FLINK-20040
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Jingsong Lee






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wangyang0918 commented on pull request #13871: [FLINK-19544][k8s] Implement CheckpointRecoveryFactory based on Kubernetes API

2020-11-06 Thread GitBox


wangyang0918 commented on pull request #13871:
URL: https://github.com/apache/flink/pull/13871#issuecomment-723388940


   Address the minor comments, rebase latest master and force pushed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


wangyang0918 commented on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-723388865


   Address the naming comments, rebase latest master and force pushed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-20038) Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.

2020-11-06 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227720#comment-17227720
 ] 

Jin Xing edited comment on FLINK-20038 at 11/7/20, 4:10 AM:


Gentle ping [~zhuzh] and [~azagrebin] Flink master for shepherd, do you think 
the concern I mentioned in description is valid ?

There could be several ways to fix/improve, and I'm not sure which one is 
preferred:

1. Minor change this 
[snippet|http://https//github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/JobMasterPartitionTrackerImpl.java#L66]
 in partition tracker as below
{code:java}
ResultPartitionType rpType = 
resultPartitionDeploymentDescriptor.getPartitionType();
if (rpType == PIPELINED || rpType == PIPELINED_BOUNDED)) {
   return;
}
{code}
2. Add an attribute of *_releaseOnConsumption_* in ResultPartitionType, 
releaseOnConsumption=true for PIPELINED and PIPELINED_BOUNDED, 
*_releaseOnConsumption=false_* for BLOCKING, then check 
*_releaseOnConsumption_* in partition tracker.

3. Currently partition tracker tracks a partition only if it's not released on 
consumption, shall we consider to remove this limitation? I don't see any 
critical issues if partition tracker tracks all kinds of result partition.

 


was (Author: jinxing6...@126.com):
Gentle ping [~zhuzh] and [~azagrebin] Flink master for shepherd, do you think 
the concern I mentioned in description is valid ?

There could be several ways to fix/improve, and I'm not sure which one is 
preferred:

1. Minor change this 
[snippet|http://https//github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/JobMasterPartitionTrackerImpl.java#L66]
 in partition tracker as below

 
{code:java}
ResultPartitionType rpType = 
resultPartitionDeploymentDescriptor.getPartitionType();
if (rpType == PIPELINED || rpType == PIPELINED_BOUNDED)) {
   return;
}
{code}
2. Add an attribute of *_releaseOnConsumption_* in ResultPartitionType, 
releaseOnConsumption=true for PIPELINED and PIPELINED_BOUNDED, 
*_releaseOnConsumption=false_* for BLOCKING, then check 
*_releaseOnConsumption_* in partition tracker.

3. Currently partition tracker tracks a partition only if it's not released on 
consumption, shall we consider to remove this limitation? I don't see any 
critical issues if partition tracker tracks all kinds of result partition.

 

> Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.
> 
>
> Key: FLINK-20038
> URL: https://issues.apache.org/jira/browse/FLINK-20038
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Jin Xing
>Priority: Major
>
> After "FLIP-31: Pluggable Shuffle Service", users can extend and plug in new 
> shuffle manner, thus to benefit different scenarios. New shuffle manner tend 
> to bring in new abilities which could be leveraged by scheduling layer to 
> provide better performance.
> From my understanding, the characteristics of shuffle manner is exposed by 
> ResultPartitionType (e.g. isPipelined, isBlocking, hasBackPressure ...), and 
> leveraged by scheduling layer to conduct job. But seems that Flink doesn't 
> provide a way to describe the new characteristics from a plugged in shuffle 
> manner. I also find that scheduling layer have some weak assumptions on 
> ResultPartitionType. I detail by below example.
> In our internal Flink, we develop a new shuffle manner for batch jobs. 
> Characteristics can be summarized as below briefly:
> 1. Upstream task shuffle writes data to DISK;
> 2. Upstream task commits data while producing and notify "consumable" to 
> downstream BEFORE task finished;
> 3. Downstream is notified when upstream data is consumable and can be 
> scheduled according to resource;
> 4. When downstream task failover, only itself needs to be restarted because 
> upstream data is written into disk and replayable;
> We can tell the character of this new shuffle manner as:
> a. isPipelined=true – downstream task can consume data before upstream 
> finished;
> b. hasBackPressure=false – upstream task shuffle writes data to disk and can 
> finish by itself no matter if there's downstream task consumes the data in 
> time.
> But above new ResultPartitionType(isPipelined=true, hasBackPressure=false) 
> seems contradicts the partition lifecycle management in current scheduling 
> layer:
> 1. The above new shuffle manner needs partition tracker for lifecycle 
> management, but current Flink assumes that ALL "isPipelined=true" result 
> partitions are released on consumption and will not be taken care of by 
> partition tracker 
> 

[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9235)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20038) Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.

2020-11-06 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227720#comment-17227720
 ] 

Jin Xing commented on FLINK-20038:
--

Gentle ping [~zhuzh] and [~azagrebin] Flink master for shepherd, do you think 
the concern I mentioned in description is valid ?

There could be several ways to fix/improve, and I'm not sure which one is 
preferred:

1. Minor change this 
[snippet|http://https//github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/JobMasterPartitionTrackerImpl.java#L66]
 in partition tracker as below

 
{code:java}
ResultPartitionType rpType = 
resultPartitionDeploymentDescriptor.getPartitionType();
if (rpType == PIPELINED || rpType == PIPELINED_BOUNDED)) {
   return;
}
{code}
2. Add an attribute of *_releaseOnConsumption_* in ResultPartitionType, 
releaseOnConsumption=true for PIPELINED and PIPELINED_BOUNDED, 
*_releaseOnConsumption=false_* for BLOCKING, then check 
*_releaseOnConsumption_* in partition tracker.

3. Currently partition tracker tracks a partition only if it's not released on 
consumption, shall we consider to remove this limitation? I don't see any 
critical issues if partition tracker tracks all kinds of result partition.

 

> Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.
> 
>
> Key: FLINK-20038
> URL: https://issues.apache.org/jira/browse/FLINK-20038
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Jin Xing
>Priority: Major
>
> After "FLIP-31: Pluggable Shuffle Service", users can extend and plug in new 
> shuffle manner, thus to benefit different scenarios. New shuffle manner tend 
> to bring in new abilities which could be leveraged by scheduling layer to 
> provide better performance.
> From my understanding, the characteristics of shuffle manner is exposed by 
> ResultPartitionType (e.g. isPipelined, isBlocking, hasBackPressure ...), and 
> leveraged by scheduling layer to conduct job. But seems that Flink doesn't 
> provide a way to describe the new characteristics from a plugged in shuffle 
> manner. I also find that scheduling layer have some weak assumptions on 
> ResultPartitionType. I detail by below example.
> In our internal Flink, we develop a new shuffle manner for batch jobs. 
> Characteristics can be summarized as below briefly:
> 1. Upstream task shuffle writes data to DISK;
> 2. Upstream task commits data while producing and notify "consumable" to 
> downstream BEFORE task finished;
> 3. Downstream is notified when upstream data is consumable and can be 
> scheduled according to resource;
> 4. When downstream task failover, only itself needs to be restarted because 
> upstream data is written into disk and replayable;
> We can tell the character of this new shuffle manner as:
> a. isPipelined=true – downstream task can consume data before upstream 
> finished;
> b. hasBackPressure=false – upstream task shuffle writes data to disk and can 
> finish by itself no matter if there's downstream task consumes the data in 
> time.
> But above new ResultPartitionType(isPipelined=true, hasBackPressure=false) 
> seems contradicts the partition lifecycle management in current scheduling 
> layer:
> 1. The above new shuffle manner needs partition tracker for lifecycle 
> management, but current Flink assumes that ALL "isPipelined=true" result 
> partitions are released on consumption and will not be taken care of by 
> partition tracker 
> ([link|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/JobMasterPartitionTrackerImpl.java#L66])
>  – the limitation is not correct for this case.
> From my understanding, the method of ResultPartitionType#isPipelined() 
> indicates whether data can be consumed while being produced, and it's 
> orthogonal to whether the partition is released on consumption. I propose to 
> have a fix on this and fully respect to the original meaning of 
> ResultPartitionType#isPipelined().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-719924065


   
   ## CI report:
   
   * 3c1eac765ebab2b74ade4ccbcf576e496e4bdffc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9174)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9201)
 
   * 8afe5fbf796d216f655dd6e7ad7585f328196f54 UNKNOWN
   * a27c2c7198ec1cbaf7ad5d84c5d0562033b60e4e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9232)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-20034) Add maxwell-json format document

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-20034.
---
Fix Version/s: (was: 1.12.0)
   Resolution: Duplicate

> Add maxwell-json format document
> 
>
> Key: FLINK-20034
> URL: https://issues.apache.org/jira/browse/FLINK-20034
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile)
>Affects Versions: 1.12.0
>Reporter: hailong wang
>Priority: Minor
>
> Maxwell-json format document is missing, we should add maxwell.md and 
> maxwell.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18897) Add documentation for the maxwell-json format

2020-11-06 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227719#comment-17227719
 ] 

Jark Wu commented on FLINK-18897:
-

Hi [~shen_dijie], are you still working on this?

> Add documentation for the maxwell-json format
> -
>
> Key: FLINK-18897
> URL: https://issues.apache.org/jira/browse/FLINK-18897
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile), Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: jinxin
>Priority: Major
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19488) Failed compilation of generated class

2020-11-06 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227718#comment-17227718
 ] 

Jark Wu edited comment on FLINK-19488 at 11/7/20, 4:03 AM:
---

[~satyamshekhar], There is a bug when in code generation for NOW() function. 
This has been fixed by FLINK-19948 recently. 

Could you use {{CURRENT_TIMESTAMP()}} and try again? It has the same semantic 
with NOW(). 
You can also build the latest release-1.11 branch and try the NOW() function 
that. 


was (Author: jark):
[~satyamshekhar], There is a bug when in code generation for NOW() function. 
This has been fixed by FLINK-19948 recently. 

Could you use {{CURRENT_TIMESTAMP()}} and try again? It has the same semantic 
with NOW(). You can also build the latest release-1.11 branch and try on that. 

> Failed compilation of generated class
> -
>
> Key: FLINK-19488
> URL: https://issues.apache.org/jira/browse/FLINK-19488
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
> Environment: Flink version: 1.11.1
>  
>Reporter: Satyam Shekhar
>Priority: Major
> Attachments: code.java
>
>
> I have a table T0 with the following schema -
>  
> {code:java}
> root
>      |-- C0: BIGINT
>      |-- C1: STRING
>      |-- blaze_itime: TIMESTAMP(3)
> {code}
>  
> The following SQL query fails for the above table - 
> {code:java}
> SELECT A.C0 AS output, A.C1 AS dim0 FROM T0 AS A WHERE (A.blaze_itime BETWEEN 
> NOW() - INTERVAL '10' MINUTE AND NOW());
> {code}
>  
> The generated code for the above query tries to assign a long value to 
> timestamp type and fails to compile with the following exception -
> {code:java}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'BatchCalc$14'java.lang.RuntimeException: Could not instantiate generated 
> class 'BatchCalc$14' at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:67)
>  at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:70)
>  at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:470)
>  at
> ...
> Caused by: org.codehaus.commons.compiler.CompileException: Line 55, Column 
> 21: Assignment conversion not possible from type "long" to type 
> "org.apache.flink.table.data.TimestampData"Caused by: 
> org.codehaus.commons.compiler.CompileException: Line 55, Column 21: 
> Assignment conversion not possible from type "long" to type 
> "org.apache.flink.table.data.TimestampData" at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124) at 
> org.codehaus.janino.UnitCompiler.assignmentConversion(UnitCompiler.java:10975)
>  at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3788) at 
> org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:215)
> {code}
>  
> The generated code is added as an attachment to the issue.
>  
> The Environment has the following configuration parameters -
> {code:java}
> env.setParallelism(Integer.getInteger("flinkParallelism", 2));
> env.getConfig().enableObjectReuse();
> var settings = EnvironmentSettings.newInstance()
>   .useBlinkPlanner()
>   .inBatchMode()
>   .build();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19488) Failed compilation of generated class

2020-11-06 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227718#comment-17227718
 ] 

Jark Wu commented on FLINK-19488:
-

[~satyamshekhar], There is a bug when in code generation for NOW() function. 
This has been fixed by FLINK-19948 recently. 

Could you use {{CURRENT_TIMESTAMP()}} and try again? It has the same semantic 
with NOW(). You can also build the latest release-1.11 branch and try on that. 

> Failed compilation of generated class
> -
>
> Key: FLINK-19488
> URL: https://issues.apache.org/jira/browse/FLINK-19488
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
> Environment: Flink version: 1.11.1
>  
>Reporter: Satyam Shekhar
>Priority: Major
> Attachments: code.java
>
>
> I have a table T0 with the following schema -
>  
> {code:java}
> root
>      |-- C0: BIGINT
>      |-- C1: STRING
>      |-- blaze_itime: TIMESTAMP(3)
> {code}
>  
> The following SQL query fails for the above table - 
> {code:java}
> SELECT A.C0 AS output, A.C1 AS dim0 FROM T0 AS A WHERE (A.blaze_itime BETWEEN 
> NOW() - INTERVAL '10' MINUTE AND NOW());
> {code}
>  
> The generated code for the above query tries to assign a long value to 
> timestamp type and fails to compile with the following exception -
> {code:java}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'BatchCalc$14'java.lang.RuntimeException: Could not instantiate generated 
> class 'BatchCalc$14' at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:67)
>  at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:70)
>  at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:470)
>  at
> ...
> Caused by: org.codehaus.commons.compiler.CompileException: Line 55, Column 
> 21: Assignment conversion not possible from type "long" to type 
> "org.apache.flink.table.data.TimestampData"Caused by: 
> org.codehaus.commons.compiler.CompileException: Line 55, Column 21: 
> Assignment conversion not possible from type "long" to type 
> "org.apache.flink.table.data.TimestampData" at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124) at 
> org.codehaus.janino.UnitCompiler.assignmentConversion(UnitCompiler.java:10975)
>  at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3788) at 
> org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:215)
> {code}
>  
> The generated code is added as an attachment to the issue.
>  
> The Environment has the following configuration parameters -
> {code:java}
> env.setParallelism(Integer.getInteger("flinkParallelism", 2));
> env.getConfig().enableObjectReuse();
> var settings = EnvironmentSettings.newInstance()
>   .useBlinkPlanner()
>   .inBatchMode()
>   .build();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19488) Failed compilation of generated class

2020-11-06 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227718#comment-17227718
 ] 

Jark Wu edited comment on FLINK-19488 at 11/7/20, 4:03 AM:
---

[~satyamshekhar], There is a bug when in code generation for NOW() function. 
This has been fixed by FLINK-19948 recently. 

Could you use {{CURRENT_TIMESTAMP()}} and try again? It has the same semantic 
with NOW(). 
You can also build the latest release-1.11 branch and try the NOW() function. 


was (Author: jark):
[~satyamshekhar], There is a bug when in code generation for NOW() function. 
This has been fixed by FLINK-19948 recently. 

Could you use {{CURRENT_TIMESTAMP()}} and try again? It has the same semantic 
with NOW(). 
You can also build the latest release-1.11 branch and try the NOW() function 
that. 

> Failed compilation of generated class
> -
>
> Key: FLINK-19488
> URL: https://issues.apache.org/jira/browse/FLINK-19488
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.1
> Environment: Flink version: 1.11.1
>  
>Reporter: Satyam Shekhar
>Priority: Major
> Attachments: code.java
>
>
> I have a table T0 with the following schema -
>  
> {code:java}
> root
>      |-- C0: BIGINT
>      |-- C1: STRING
>      |-- blaze_itime: TIMESTAMP(3)
> {code}
>  
> The following SQL query fails for the above table - 
> {code:java}
> SELECT A.C0 AS output, A.C1 AS dim0 FROM T0 AS A WHERE (A.blaze_itime BETWEEN 
> NOW() - INTERVAL '10' MINUTE AND NOW());
> {code}
>  
> The generated code for the above query tries to assign a long value to 
> timestamp type and fails to compile with the following exception -
> {code:java}
> java.lang.RuntimeException: Could not instantiate generated class 
> 'BatchCalc$14'java.lang.RuntimeException: Could not instantiate generated 
> class 'BatchCalc$14' at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:67)
>  at 
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:40)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:70)
>  at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:470)
>  at
> ...
> Caused by: org.codehaus.commons.compiler.CompileException: Line 55, Column 
> 21: Assignment conversion not possible from type "long" to type 
> "org.apache.flink.table.data.TimestampData"Caused by: 
> org.codehaus.commons.compiler.CompileException: Line 55, Column 21: 
> Assignment conversion not possible from type "long" to type 
> "org.apache.flink.table.data.TimestampData" at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124) at 
> org.codehaus.janino.UnitCompiler.assignmentConversion(UnitCompiler.java:10975)
>  at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3788) at 
> org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:215)
> {code}
>  
> The generated code is added as an attachment to the issue.
>  
> The Environment has the following configuration parameters -
> {code:java}
> env.setParallelism(Integer.getInteger("flinkParallelism", 2));
> env.getConfig().enableObjectReuse();
> var settings = EnvironmentSettings.newInstance()
>   .useBlinkPlanner()
>   .inBatchMode()
>   .build();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20036) Join Has NoUniqueKey when using mini-batch

2020-11-06 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227717#comment-17227717
 ] 

Jark Wu commented on FLINK-20036:
-

Thanks for reporting this. 

> Join Has NoUniqueKey when using mini-batch
> --
>
> Key: FLINK-20036
> URL: https://issues.apache.org/jira/browse/FLINK-20036
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.2
>Reporter: Rex Remind
>Priority: Major
> Fix For: 1.12.0
>
>
> Hello,
>  
> We tried out mini-batch mode and our Join suddenly had NoUniqueKey.
> Join:
> {code:java}
> Table membershipsTable = tableEnv.from(SOURCE_MEMBERSHIPS)
>   .renameColumns($("id").as("membership_id"))
>   .select($("*")).join(usersTable, $("user_id").isEqual($("id")));
> {code}
> Mini-batch config:
> {code:java}
> configuration.setString("table.exec.mini-batch.enabled", "true"); // enable 
> mini-batch optimization
> configuration.setString("table.exec.mini-batch.allow-latency", "5 s"); // use 
> 5 seconds to buffer input records
> configuration.setString("table.exec.mini-batch.size", "5000"); // the maximum 
> number of records can be buffered by each aggregate operator task
> {code}
>  
> Join with mini-batch:
> {code:java}
>  Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, 
> group_id, user_id, uuid, owner, id0, deleted_at], 
> leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) 
> {code}
> Join without mini-batch:
> {code:java}
> Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id, 
> user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey], 
> rightInputSpec=[JoinKeyContainsUniqueKey])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20037) there is a comment problem in fromValues(AbstractDataType rowType, Object... values) method of TableEnvironment

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-20037:

Component/s: (was: API / Core)
 Table SQL / API

> there is a comment problem in fromValues(AbstractDataType rowType, 
> Object... values)  method of TableEnvironment
> ---
>
> Key: FLINK-20037
> URL: https://issues.apache.org/jira/browse/FLINK-20037
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.11.1
>Reporter: yandufeng
>Priority: Trivial
>  Labels: starter
>
> there is a comment problem in fromValues(AbstractDataType rowType, 
> Object... values) method of TableEnvironment, i think second column name is 
> "name" not "f1". this is my first issue, if there is a problem, please 
> understand.
> * Examples:
> * {@code
> * tEnv.fromValues(
> * DataTypes.ROW(
> * DataTypes.FIELD("id", DataTypes.DECIMAL(10, 2)),
> * DataTypes.FIELD("name", DataTypes.STRING())
> * ),
> * row(1, "ABC"),
> * row(2L, "ABCDE")
> * )
> * }
> * will produce a Table with a schema as follows:
> * {@code
> * root
> * |-- id: DECIMAL(10, 2)
> * |-- f1: STRING
> * }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20036) Join Has NoUniqueKey when using mini-batch

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-20036:

Fix Version/s: 1.12.0

> Join Has NoUniqueKey when using mini-batch
> --
>
> Key: FLINK-20036
> URL: https://issues.apache.org/jira/browse/FLINK-20036
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.2
>Reporter: Rex Remind
>Priority: Major
> Fix For: 1.12.0
>
>
> Hello,
>  
> We tried out mini-batch mode and our Join suddenly had NoUniqueKey.
> Join:
> {code:java}
> Table membershipsTable = tableEnv.from(SOURCE_MEMBERSHIPS)
>   .renameColumns($("id").as("membership_id"))
>   .select($("*")).join(usersTable, $("user_id").isEqual($("id")));
> {code}
> Mini-batch config:
> {code:java}
> configuration.setString("table.exec.mini-batch.enabled", "true"); // enable 
> mini-batch optimization
> configuration.setString("table.exec.mini-batch.allow-latency", "5 s"); // use 
> 5 seconds to buffer input records
> configuration.setString("table.exec.mini-batch.size", "5000"); // the maximum 
> number of records can be buffered by each aggregate operator task
> {code}
>  
> Join with mini-batch:
> {code:java}
>  Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, 
> group_id, user_id, uuid, owner, id0, deleted_at], 
> leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) 
> {code}
> Join without mini-batch:
> {code:java}
> Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id, 
> user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey], 
> rightInputSpec=[JoinKeyContainsUniqueKey])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20038) Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.

2020-11-06 Thread Jin Xing (Jira)
Jin Xing created FLINK-20038:


 Summary: Rectify the usage of ResultPartitionType#isPipelined() in 
partition tracker.
 Key: FLINK-20038
 URL: https://issues.apache.org/jira/browse/FLINK-20038
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: Jin Xing


After "FLIP-31: Pluggable Shuffle Service", users can extend and plug in new 
shuffle manner, thus to benefit different scenarios. New shuffle manner tend to 
bring in new abilities which could be leveraged by scheduling layer to provide 
better performance.

>From my understanding, the characteristics of shuffle manner is exposed by 
>ResultPartitionType (e.g. isPipelined, isBlocking, hasBackPressure ...), and 
>leveraged by scheduling layer to conduct job. But seems that Flink doesn't 
>provide a way to describe the new characteristics from a plugged in shuffle 
>manner. I also find that scheduling layer have some weak assumptions on 
>ResultPartitionType. I detail by below example.

In our internal Flink, we develop a new shuffle manner for batch jobs. 
Characteristics can be summarized as below briefly:
1. Upstream task shuffle writes data to DISK;
2. Upstream task commits data while producing and notify "consumable" to 
downstream BEFORE task finished;
3. Downstream is notified when upstream data is consumable and can be scheduled 
according to resource;
4. When downstream task failover, only itself needs to be restarted because 
upstream data is written into disk and replayable;

We can tell the character of this new shuffle manner as:
a. isPipelined=true – downstream task can consume data before upstream finished;
b. hasBackPressure=false – upstream task shuffle writes data to disk and can 
finish by itself no matter if there's downstream task consumes the data in time.

But above new ResultPartitionType(isPipelined=true, hasBackPressure=false) 
seems contradicts the partition lifecycle management in current scheduling 
layer:
1. The above new shuffle manner needs partition tracker for lifecycle 
management, but current Flink assumes that ALL "isPipelined=true" result 
partitions are released on consumption and will not be taken care of by 
partition tracker 
([link|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/JobMasterPartitionTrackerImpl.java#L66])
 – the limitation is not correct for this case.

>From my understanding, the method of ResultPartitionType#isPipelined() 
>indicates whether data can be consumed while being produced, and it's 
>orthogonal to whether the partition is released on consumption. I propose to 
>have a fix on this and fully respect to the original meaning of 
>ResultPartitionType#isPipelined().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-14356) Introduce "raw" format to (de)serialize message to a single field

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-14356.
---
Resolution: Fixed

Implemented in master (1.12.0) with:
 - d230058932d7563dff7de6bc8d8045dae4326f1f
 - 17c38118df65cce442b447c7c4b822b722ad895f

> Introduce "raw" format to (de)serialize message to a single field
> -
>
> Key: FLINK-14356
> URL: https://issues.apache.org/jira/browse/FLINK-14356
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / API
>Reporter: jinfeng
>Assignee: jinfeng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> I want to use flink sql to write kafka messages directly to hdfs. The 
> serialization and deserialization of messages are not involved in the middle. 
>  The bytes of the message directly convert the first field of Row.  However, 
> the current RowSerializationSchema does not support the conversion of bytes 
> to VARBINARY. Can we add some special RowSerializationSchema and 
> RowDerializationSchema ? 
> 
> Copied from FLINK-9963:
> Sometimes it might be useful to just read or write a single value into Kafka 
> or other connectors. We should add a single-value SerializationSchemaFactory 
> and single-value DeserializationSchemaFactory, the types below and their 
> array types shall be considered.
> byte, short, int, long, float, double, string
> For the numeric types, we might want to specify the endian format.
> A string type single-value format will be added with this issue for future 
> reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20039) Streaming File Sink end-to-end test is unstable on Azure

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-20039:

Fix Version/s: 1.12.0

> Streaming File Sink end-to-end test is unstable on Azure
> 
>
> Key: FLINK-20039
> URL: https://issues.apache.org/jira/browse/FLINK-20039
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.12.0
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9211=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-11-06T19:56:33.5300892Z Nov 06 19:56:33 
> ==
> 2020-11-06T19:56:33.5301454Z Nov 06 19:56:33 Running 'Streaming File Sink 
> end-to-end test'
> 2020-11-06T19:56:33.5301877Z Nov 06 19:56:33 
> ==
> 2020-11-06T19:56:33.5315153Z Nov 06 19:56:33 TEST_DATA_DIR: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-33530977067
> 2020-11-06T19:56:33.6838741Z 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test-runner-common.sh:
>  line 57: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_streaming_file_sink.sh:
>  No such file or directory
> 2020-11-06T19:56:33.6839644Z Nov 06 19:56:33 [FAIL] Test script contains 
> errors.
> 2020-11-06T19:56:33.6847199Z Nov 06 19:56:33 Checking of logs skipped.
> 2020-11-06T19:56:33.6847743Z Nov 06 19:56:33 
> 2020-11-06T19:56:33.6848433Z Nov 06 19:56:33 [FAIL] 'Streaming File Sink 
> end-to-end test' failed after 0 minutes and 0 seconds! Test exited with exit 
> code 1
> 2020-11-06T19:56:33.6848756Z Nov 06 19:56:33 
> 2020-11-06T19:56:33.6849046Z Nov 06 19:56:33 ##[group]Environment Information
> 2020-11-06T19:56:33.6849260Z Nov 06 19:56:33 Jps
> 2020-11-06T19:56:33.7825775Z Nov 06 19:56:33 77536 Jps
> 2020-11-06T19:56:33.7946109Z Nov 06 19:56:33 Disk information
> 2020-11-06T19:56:33.7958895Z Nov 06 19:56:33 Filesystem  Size  Used Avail 
> Use% Mounted on
> 2020-11-06T19:56:33.7959315Z Nov 06 19:56:33 udev3.7G 0  3.7G 
>   0% /dev
> 2020-11-06T19:56:33.7959726Z Nov 06 19:56:33 tmpfs   729M   18M  712M 
>   3% /run
> 2020-11-06T19:56:33.7960079Z Nov 06 19:56:33 /dev/sda190G   63G   28G 
>  70% /
> 2020-11-06T19:56:33.7960411Z Nov 06 19:56:33 tmpfs   3.7G  8.2k  3.7G 
>   1% /dev/shm
> 2020-11-06T19:56:33.7960765Z Nov 06 19:56:33 tmpfs   5.3M 0  5.3M 
>   0% /run/lock
> 2020-11-06T19:56:33.796Z Nov 06 19:56:33 tmpfs   3.7G 0  3.7G 
>   0% /sys/fs/cgroup
> 2020-11-06T19:56:33.7961479Z Nov 06 19:56:33 /dev/sda15  110M  3.8M  106M 
>   4% /boot/efi
> 2020-11-06T19:56:33.7961811Z Nov 06 19:56:33 /dev/sdb115G  4.4G  9.6G 
>  32% /mnt
> 2020-11-06T19:56:33.7962124Z Nov 06 19:56:33 Allocated ports
> 2020-11-06T19:56:33.8060627Z Nov 06 19:56:33 Active Internet connections 
> (only servers)
> 2020-11-06T19:56:33.8061440Z Nov 06 19:56:33 Proto Recv-Q Send-Q Local 
> Address   Foreign Address State   PID/Program name
> 2020-11-06T19:56:33.8062056Z Nov 06 19:56:33 tcp0  0 
> 127.0.0.1:42609 0.0.0.0:*   LISTEN  1574/containerd 
> 2020-11-06T19:56:33.8062589Z Nov 06 19:56:33 tcp0  0 0.0.0.0:22   
>0.0.0.0:*   LISTEN  1777/sshd   
> 2020-11-06T19:56:33.8063102Z Nov 06 19:56:33 tcp6   0  0 :::22
>:::*LISTEN  1777/sshd   
> 2020-11-06T19:56:33.8063617Z Nov 06 19:56:33 udp0  0 0.0.0.0:68   
>0.0.0.0:*   1099/dhclient   
> 2020-11-06T19:56:33.8069826Z Nov 06 19:56:33 Running docker containers
> 2020-11-06T19:56:33.8449153Z Nov 06 19:56:33 CONTAINER IDIMAGE
>   COMMAND  CREATED STATUS 
> PORTS   NAMES
> 2020-11-06T19:56:33.8451289Z Nov 06 19:56:33 7360a044662a
> k8s.gcr.io/pause:3.1   "/pause" About an hour ago   Exited 
> (0) About an hour ago   
> k8s_POD_flink-native-k8s-pyflink-application-1-taskmanager-1-1_default_b9fe79cc-84ff-4bae-95a0-6c105d8c6c61_0
> 2020-11-06T19:56:33.8452835Z Nov 06 19:56:33 691869b9d84e
> k8s.gcr.io/pause:3.1   "/pause" About an hour ago   Exited 
> (0) About an hour ago   
> k8s_POD_flink-native-k8s-pyflink-application-1-86985776d9-hc7gb_default_446bf0cc-f01c-4b50-97c6-f27ff3ccfc76_0
> 2020-11-06T19:56:33.8453885Z Nov 06 19:56:33 9f06de0387f870f311871ae1 
>   "/coredns 

[jira] [Updated] (FLINK-14356) Introduce "raw" format to (de)serialize message to a single field

2020-11-06 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-14356:

Summary: Introduce "raw" format to (de)serialize message to a single field  
(was: Introduce "single-field" format to (de)serialize message to a single 
field)

> Introduce "raw" format to (de)serialize message to a single field
> -
>
> Key: FLINK-14356
> URL: https://issues.apache.org/jira/browse/FLINK-14356
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / API
>Reporter: jinfeng
>Assignee: jinfeng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> I want to use flink sql to write kafka messages directly to hdfs. The 
> serialization and deserialization of messages are not involved in the middle. 
>  The bytes of the message directly convert the first field of Row.  However, 
> the current RowSerializationSchema does not support the conversion of bytes 
> to VARBINARY. Can we add some special RowSerializationSchema and 
> RowDerializationSchema ? 
> 
> Copied from FLINK-9963:
> Sometimes it might be useful to just read or write a single value into Kafka 
> or other connectors. We should add a single-value SerializationSchemaFactory 
> and single-value DeserializationSchemaFactory, the types below and their 
> array types shall be considered.
> byte, short, int, long, float, double, string
> For the numeric types, we might want to specify the endian format.
> A string type single-value format will be added with this issue for future 
> reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong merged pull request #13909: [FLINK-14356][table][formats] Introduce "raw" format to (de)serialize message to a single field

2020-11-06 Thread GitBox


wuchong merged pull request #13909:
URL: https://github.com/apache/flink/pull/13909


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wuchong commented on pull request #13909: [FLINK-14356][table][formats] Introduce "raw" format to (de)serialize message to a single field

2020-11-06 Thread GitBox


wuchong commented on pull request #13909:
URL: https://github.com/apache/flink/pull/13909#issuecomment-723387225


   The failed e2e should not be related to this PR. I created 
https://issues.apache.org/jira/browse/FLINK-20039 to track the unstable e2e 
test. 
   
   Will merge this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20039) Streaming File Sink end-to-end test is unstable on Azure

2020-11-06 Thread Jark Wu (Jira)
Jark Wu created FLINK-20039:
---

 Summary: Streaming File Sink end-to-end test is unstable on Azure
 Key: FLINK-20039
 URL: https://issues.apache.org/jira/browse/FLINK-20039
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.12.0
Reporter: Jark Wu


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9211=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529


{code}
2020-11-06T19:56:33.5300892Z Nov 06 19:56:33 
==
2020-11-06T19:56:33.5301454Z Nov 06 19:56:33 Running 'Streaming File Sink 
end-to-end test'
2020-11-06T19:56:33.5301877Z Nov 06 19:56:33 
==
2020-11-06T19:56:33.5315153Z Nov 06 19:56:33 TEST_DATA_DIR: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-33530977067
2020-11-06T19:56:33.6838741Z 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test-runner-common.sh: 
line 57: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_streaming_file_sink.sh:
 No such file or directory
2020-11-06T19:56:33.6839644Z Nov 06 19:56:33 [FAIL] Test script contains errors.
2020-11-06T19:56:33.6847199Z Nov 06 19:56:33 Checking of logs skipped.
2020-11-06T19:56:33.6847743Z Nov 06 19:56:33 
2020-11-06T19:56:33.6848433Z Nov 06 19:56:33 [FAIL] 'Streaming File Sink 
end-to-end test' failed after 0 minutes and 0 seconds! Test exited with exit 
code 1
2020-11-06T19:56:33.6848756Z Nov 06 19:56:33 
2020-11-06T19:56:33.6849046Z Nov 06 19:56:33 ##[group]Environment Information
2020-11-06T19:56:33.6849260Z Nov 06 19:56:33 Jps
2020-11-06T19:56:33.7825775Z Nov 06 19:56:33 77536 Jps
2020-11-06T19:56:33.7946109Z Nov 06 19:56:33 Disk information
2020-11-06T19:56:33.7958895Z Nov 06 19:56:33 Filesystem  Size  Used Avail 
Use% Mounted on
2020-11-06T19:56:33.7959315Z Nov 06 19:56:33 udev3.7G 0  3.7G   
0% /dev
2020-11-06T19:56:33.7959726Z Nov 06 19:56:33 tmpfs   729M   18M  712M   
3% /run
2020-11-06T19:56:33.7960079Z Nov 06 19:56:33 /dev/sda190G   63G   28G  
70% /
2020-11-06T19:56:33.7960411Z Nov 06 19:56:33 tmpfs   3.7G  8.2k  3.7G   
1% /dev/shm
2020-11-06T19:56:33.7960765Z Nov 06 19:56:33 tmpfs   5.3M 0  5.3M   
0% /run/lock
2020-11-06T19:56:33.796Z Nov 06 19:56:33 tmpfs   3.7G 0  3.7G   
0% /sys/fs/cgroup
2020-11-06T19:56:33.7961479Z Nov 06 19:56:33 /dev/sda15  110M  3.8M  106M   
4% /boot/efi
2020-11-06T19:56:33.7961811Z Nov 06 19:56:33 /dev/sdb115G  4.4G  9.6G  
32% /mnt
2020-11-06T19:56:33.7962124Z Nov 06 19:56:33 Allocated ports
2020-11-06T19:56:33.8060627Z Nov 06 19:56:33 Active Internet connections (only 
servers)
2020-11-06T19:56:33.8061440Z Nov 06 19:56:33 Proto Recv-Q Send-Q Local Address  
 Foreign Address State   PID/Program name
2020-11-06T19:56:33.8062056Z Nov 06 19:56:33 tcp0  0 
127.0.0.1:42609 0.0.0.0:*   LISTEN  1574/containerd 
2020-11-06T19:56:33.8062589Z Nov 06 19:56:33 tcp0  0 0.0.0.0:22 
 0.0.0.0:*   LISTEN  1777/sshd   
2020-11-06T19:56:33.8063102Z Nov 06 19:56:33 tcp6   0  0 :::22  
 :::*LISTEN  1777/sshd   
2020-11-06T19:56:33.8063617Z Nov 06 19:56:33 udp0  0 0.0.0.0:68 
 0.0.0.0:*   1099/dhclient   
2020-11-06T19:56:33.8069826Z Nov 06 19:56:33 Running docker containers
2020-11-06T19:56:33.8449153Z Nov 06 19:56:33 CONTAINER IDIMAGE  
COMMAND  CREATED STATUS 
PORTS   NAMES
2020-11-06T19:56:33.8451289Z Nov 06 19:56:33 7360a044662a
k8s.gcr.io/pause:3.1   "/pause" About an hour ago   Exited (0) 
About an hour ago   
k8s_POD_flink-native-k8s-pyflink-application-1-taskmanager-1-1_default_b9fe79cc-84ff-4bae-95a0-6c105d8c6c61_0
2020-11-06T19:56:33.8452835Z Nov 06 19:56:33 691869b9d84e
k8s.gcr.io/pause:3.1   "/pause" About an hour ago   Exited (0) 
About an hour ago   
k8s_POD_flink-native-k8s-pyflink-application-1-86985776d9-hc7gb_default_446bf0cc-f01c-4b50-97c6-f27ff3ccfc76_0
2020-11-06T19:56:33.8453885Z Nov 06 19:56:33 9f06de0387f870f311871ae1   
"/coredns -conf /etc…"   About an hour ago   Exited (0) About an hour 
ago   
k8s_coredns_coredns-6955765f44-zxft7_kube-system_d7bdc27c-2e87-4949-9574-57b80e3b63cc_5
2020-11-06T19:56:33.8454878Z Nov 06 19:56:33 95d3c8c15deb70f311871ae1   
"/coredns -conf /etc…"   About an hour ago   Exited (0) About an hour 
ago   

[GitHub] [flink] flinkbot edited a comment on pull request #13957: [FLINK-19823][table][fs-connector] Filesystem connector supports de/serialization schema

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13957:
URL: https://github.com/apache/flink/pull/13957#issuecomment-722961884


   
   ## CI report:
   
   * eded6a80f900965a008691d061057dae9c0d7cfe Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9152)
 
   * 0c4e62b055b5cd62d95be004423262ed9ce5eb86 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13931: [FLINK-19811][table-planner] Introduce RexSimplify to simplify SEARCH conditions

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13931:
URL: https://github.com/apache/flink/pull/13931#issuecomment-722165982


   
   ## CI report:
   
   * abbd41e66fa93bcb058a25bfbb21a1389192a2f5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9183)
 
   * 1bfce4d4cf3ad25b18bbcc654fcf89837fbefcbf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9233)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-719924065


   
   ## CI report:
   
   * 3c1eac765ebab2b74ade4ccbcf576e496e4bdffc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9174)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9201)
 
   * 8afe5fbf796d216f655dd6e7ad7585f328196f54 UNKNOWN
   * a27c2c7198ec1cbaf7ad5d84c5d0562033b60e4e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi merged pull request #13937: [FLINK-19886][hive] Integrate file compaction to Hive connector

2020-11-06 Thread GitBox


JingsongLi merged pull request #13937:
URL: https://github.com/apache/flink/pull/13937


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19886) Integrate file compaction to Hive connector

2020-11-06 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-19886.

Resolution: Fixed

master (1.12): f1f25e012b3fc7d338f169871addbd78b86f33bf

> Integrate file compaction to Hive connector
> ---
>
> Key: FLINK-19886
> URL: https://issues.apache.org/jira/browse/FLINK-19886
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] lirui-apache commented on a change in pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


lirui-apache commented on a change in pull request #13729:
URL: https://github.com/apache/flink/pull/13729#discussion_r519093000



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveLookupTableSource.java
##
@@ -0,0 +1,285 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connectors.hive;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.ReadableConfig;
+import org.apache.flink.connectors.hive.read.HiveInputFormatPartitionReader;
+import org.apache.flink.connectors.hive.read.HivePartitionFetcherContextBase;
+import org.apache.flink.connectors.hive.util.HivePartitionUtils;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.ObjectPath;
+import org.apache.flink.table.catalog.hive.client.HiveShim;
+import org.apache.flink.table.connector.source.LookupTableSource;
+import org.apache.flink.table.connector.source.TableFunctionProvider;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.filesystem.FileSystemLookupFunction;
+import org.apache.flink.table.filesystem.PartitionFetcher;
+import org.apache.flink.table.filesystem.PartitionReader;
+import org.apache.flink.table.functions.TableFunction;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.mapred.JobConf;
+
+import java.time.Duration;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+import static 
org.apache.flink.table.filesystem.FileSystemOptions.LOOKUP_JOIN_CACHE_TTL;
+import static 
org.apache.flink.table.filesystem.FileSystemOptions.STREAMING_SOURCE_CONSUME_START_OFFSET;
+import static 
org.apache.flink.table.filesystem.FileSystemOptions.STREAMING_SOURCE_MONITOR_INTERVAL;
+import static 
org.apache.flink.table.filesystem.FileSystemOptions.STREAMING_SOURCE_PARTITION_INCLUDE;
+
+/**
+ * Hive Table Source that has lookup ability.
+ *
+ * Hive Table source has both lookup and continuous read ability, when it 
acts as continuous read source
+ * it does not have the lookup ability but can be a temporal table just like 
other stream sources.
+ * When it acts as bounded table, it has the lookup ability.
+ *
+ * A common user case is use hive table as dimension table and always 
lookup the latest partition data, in this
+ * case, hive table source is a continuous read source but currently we 
implements it by LookupFunction. Because
+ * currently TableSource can not tell the downstream when the latest partition 
has been read finished. This is a
+ * temporarily workaround and will re-implement in the future.
+ */
+public class HiveLookupTableSource extends HiveTableSource implements 
LookupTableSource {
+
+   private static final Duration DEFAULT_LOOKUP_MONITOR_INTERVAL = 
Duration.ofHours(1L);
+   private final Configuration configuration;
+   private Duration hiveTableReloadInterval;
+
+   public HiveLookupTableSource(
+   JobConf jobConf,
+   ReadableConfig flinkConf,
+   ObjectPath tablePath,
+   CatalogTable catalogTable) {
+   super(jobConf, flinkConf, tablePath, catalogTable);
+   this.configuration = new Configuration();
+   catalogTable.getOptions().forEach(configuration::setString);
+   validateLookupConfigurations();
+   }
+
+   @Override
+   public LookupRuntimeProvider getLookupRuntimeProvider(LookupContext 
context) {
+   return 
TableFunctionProvider.of(getLookupFunction(context.getKeys()));
+   }
+
+   @VisibleForTesting
+   TableFunction getLookupFunction(int[][] keys) {
+   int[] keyIndices = new int[keys.length];
+   int i = 0;
+   for (int[] key : keys) {
+   if 

[GitHub] [flink] flinkbot edited a comment on pull request #13931: [FLINK-19811][table-planner] Introduce RexSimplify to simplify SEARCH conditions

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13931:
URL: https://github.com/apache/flink/pull/13931#issuecomment-722165982


   
   ## CI report:
   
   * abbd41e66fa93bcb058a25bfbb21a1389192a2f5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9183)
 
   * 1bfce4d4cf3ad25b18bbcc654fcf89837fbefcbf UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13864: [FLINK-19543][k8s] Implement RunningJobsRegistry, JobGraphStore based on Kubernetes API

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13864:
URL: https://github.com/apache/flink/pull/13864#issuecomment-719924065


   
   ## CI report:
   
   * 3c1eac765ebab2b74ade4ccbcf576e496e4bdffc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9174)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9201)
 
   * 8afe5fbf796d216f655dd6e7ad7585f328196f54 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13729:
URL: https://github.com/apache/flink/pull/13729#issuecomment-713673157


   
   ## CI report:
   
   * aefdb5f32451d2a47b8cd57b46bc559ccde5b797 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9149)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9230)
 
   * 4a41ac978fceda679c3cd3f09a26011ed8d16029 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19282) Support watermark push down with WatermarkStrategy

2020-11-06 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-19282:
---
Fix Version/s: 1.12.0

> Support watermark push down with WatermarkStrategy
> --
>
> Key: FLINK-19282
> URL: https://issues.apache.org/jira/browse/FLINK-19282
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Shengkai Fang
>Assignee: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Push the {{WatermarkStrategy}} into TableSourceScan in the interface 
> {{SupportsWatermarkPushDown}}. Sometimes users define watermark on computed 
> column. Therefore, we will push the computed column of rowtime into 
> {{WatermarkStrategy}} first. For more info, please take a look at 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Merge-SupportsComputedColumnPushDown-and-SupportsWatermarkPushDown-td44387.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19282) Support watermark push down with WatermarkStrategy

2020-11-06 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he closed FLINK-19282.
--
Resolution: Implemented

> Support watermark push down with WatermarkStrategy
> --
>
> Key: FLINK-19282
> URL: https://issues.apache.org/jira/browse/FLINK-19282
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Shengkai Fang
>Assignee: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
>
> Push the {{WatermarkStrategy}} into TableSourceScan in the interface 
> {{SupportsWatermarkPushDown}}. Sometimes users define watermark on computed 
> column. Therefore, we will push the computed column of rowtime into 
> {{WatermarkStrategy}} first. For more info, please take a look at 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Merge-SupportsComputedColumnPushDown-and-SupportsWatermarkPushDown-td44387.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19282) Support watermark push down with WatermarkStrategy

2020-11-06 Thread godfrey he (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227699#comment-17227699
 ] 

godfrey he commented on FLINK-19282:


master: 2c1e24b53f54486827f767ab3ab5b22abc6fc278

> Support watermark push down with WatermarkStrategy
> --
>
> Key: FLINK-19282
> URL: https://issues.apache.org/jira/browse/FLINK-19282
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
>
> Push the {{WatermarkStrategy}} into TableSourceScan in the interface 
> {{SupportsWatermarkPushDown}}. Sometimes users define watermark on computed 
> column. Therefore, we will push the computed column of rowtime into 
> {{WatermarkStrategy}} first. For more info, please take a look at 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Merge-SupportsComputedColumnPushDown-and-SupportsWatermarkPushDown-td44387.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-19282) Support watermark push down with WatermarkStrategy

2020-11-06 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he reassigned FLINK-19282:
--

Assignee: Shengkai Fang

> Support watermark push down with WatermarkStrategy
> --
>
> Key: FLINK-19282
> URL: https://issues.apache.org/jira/browse/FLINK-19282
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Shengkai Fang
>Assignee: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
>
> Push the {{WatermarkStrategy}} into TableSourceScan in the interface 
> {{SupportsWatermarkPushDown}}. Sometimes users define watermark on computed 
> column. Therefore, we will push the computed column of rowtime into 
> {{WatermarkStrategy}} first. For more info, please take a look at 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Merge-SupportsComputedColumnPushDown-and-SupportsWatermarkPushDown-td44387.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13770: [FLINK-18858][connector-kinesis] Add Kinesis sources and sinks

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13770:
URL: https://github.com/apache/flink/pull/13770#issuecomment-715323245


   
   ## CI report:
   
   * c1af295630bc8064312d310846613a8a3b23bbca UNKNOWN
   * 42df702f984d75eb8ec74bc1a48613b20c0bb3c5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9226)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] lirui-apache commented on a change in pull request #13937: [FLINK-19886][hive] Integrate file compaction to Hive connector

2020-11-06 Thread GitBox


lirui-apache commented on a change in pull request #13937:
URL: https://github.com/apache/flink/pull/13937#discussion_r519086775



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/read/HiveCompactReader.java
##
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connectors.hive.read;
+
+import org.apache.flink.connector.file.src.reader.BulkFormat;
+import org.apache.flink.connectors.hive.HiveTablePartition;
+import org.apache.flink.connectors.hive.JobConfWrapper;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.hive.client.HiveShim;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.filesystem.stream.compact.CompactBulkReader;
+import org.apache.flink.table.filesystem.stream.compact.CompactContext;
+import org.apache.flink.table.filesystem.stream.compact.CompactReader;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.RowType;
+
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.mapred.JobConf;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static 
org.apache.flink.connectors.hive.util.HivePartitionUtils.restorePartitionValueFromType;
+import static 
org.apache.flink.table.utils.PartitionPathUtils.extractPartitionSpecFromPath;
+
+/**
+ * The {@link CompactReader} to delegate hive bulk format.
+ */
+public class HiveCompactReader extends CompactBulkReader {
+
+   private HiveCompactReader(BulkFormat.Reader reader) throws 
IOException {
+   super(reader);
+   }
+
+   public static CompactReader.Factory factory(
+   StorageDescriptor sd,
+   Properties properties,
+   JobConf jobConf,
+   CatalogTable catalogTable,
+   String hiveVersion,
+   RowType producedRowType,
+   boolean useMapRedReader) {
+   return new Factory(
+   sd,
+   properties,
+   new JobConfWrapper(jobConf),
+   catalogTable.getPartitionKeys(),
+   catalogTable.getSchema().getFieldNames(),
+   catalogTable.getSchema().getFieldDataTypes(),
+   hiveVersion,
+   producedRowType,
+   useMapRedReader);
+   }
+
+   /**
+* Factory to create {@link HiveCompactReader}.
+*/
+   private static class Factory implements CompactReader.Factory {
+
+   private static final long serialVersionUID = 1L;
+
+   private final StorageDescriptor sd;
+   private final Properties properties;
+   private final JobConfWrapper jobConfWrapper;
+   private final List partitionKeys;
+   private final String[] fieldNames;
+   private final DataType[] fieldTypes;
+   private final String hiveVersion;
+   private final HiveShim shim;
+   private final RowType producedRowType;
+   private final boolean useMapRedReader;
+
+   private Factory(
+   StorageDescriptor sd,
+   Properties properties,
+   JobConfWrapper jobConfWrapper,
+   List partitionKeys,
+   String[] fieldNames,
+   DataType[] fieldTypes,
+   String hiveVersion,
+   RowType producedRowType,
+   boolean useMapRedReader) {
+   this.sd = sd;
+   

[GitHub] [flink] godfreyhe merged pull request #13449: [FLINK-19282][table sql/planner]Supports watermark push down with Wat…

2020-11-06 Thread GitBox


godfreyhe merged pull request #13449:
URL: https://github.com/apache/flink/pull/13449


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13973: [FLINK-19850] Add e2e tests for the new File Sink in the streaming mode

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13973:
URL: https://github.com/apache/flink/pull/13973#issuecomment-723320462


   
   ## CI report:
   
   * c9aaec278c8352ee530c04016a0f8f7ab59252a6 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9225)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13729:
URL: https://github.com/apache/flink/pull/13729#issuecomment-713673157


   
   ## CI report:
   
   * aefdb5f32451d2a47b8cd57b46bc559ccde5b797 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9149)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9230)
 
   * 4a41ac978fceda679c3cd3f09a26011ed8d16029 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] 1996fanrui commented on pull request #13885: [FLINK-19911] Read checkpoint stream with buffer to speedup restore

2020-11-06 Thread GitBox


1996fanrui commented on pull request #13885:
URL: https://github.com/apache/flink/pull/13885#issuecomment-723377721


   > Sorry to be late to the game here, but could you share a bit more 
information on what the original setup was?
   > Specifically, what was your checkpoint storage system that offered such 
bad stream read performance? Was it HDFS? OSS? S3?
   > 
   > Looking at this change here, it seems very big with more options and many 
changed classes, for "just" introducing a buffer in a stream. That makes me 
skeptical that this is fixed in the right place.
   > 
   > Two other options to solve this:
   > 
   > (1) Input stream buffering is a property of the `CheckpointStorage`. It is 
created there, rather than in the state backends that have to wrap the stream. 
That way it works for all users of the `CheckpointStorage`, not just the state 
backends that happened to be adjusted to wrap the stream.
   > 
   > (2) Alternatively, we can make it a contract that all FileSystem 
implementations return well-buffered streams. Some already do this by default, 
wrapping them with another buffered stream adds just another layer and extra 
copying of bytes, costing performance. The ones that do not do that are easily 
adjusted.
   > 
   > At a first glance, I'd say let's go with option (2) if possible, otherwise 
option (1).
   > Hence the question: Which FS did you use that had such bad performing 
streams?
   
   Thanks for your comments.
   I use hdfs, mainly because there are a lot of small IO during restore. A 
Flink job is provided in jira to reproduce the test results.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] leonardBang commented on pull request #13729: [FLINK-19644][hive] Support read latest partition of Hive table in temporal join

2020-11-06 Thread GitBox


leonardBang commented on pull request #13729:
URL: https://github.com/apache/flink/pull/13729#issuecomment-723377617


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19692) Can't restore feedback channel from savepoint

2020-11-06 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-19692:

Fix Version/s: (was: statefun-2.3.0)

> Can't restore feedback channel from savepoint
> -
>
> Key: FLINK-19692
> URL: https://issues.apache.org/jira/browse/FLINK-19692
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-2.0.0, statefun-2.1.0, statefun-2.2.0
>Reporter: Antti Kaikkonen
>Assignee: Igal Shilman
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: statefun-2.2.1
>
>
> When using the new statefun-flink-datastream integration the following error 
> is thrown by the *feedback -> union* task when trying to restore from a 
> savepoint:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.IOException: position out of bounds
> at 
> org.apache.flink.runtime.state.StatePartitionStreamProvider.getStream(StatePartitionStreamProvider.java:58)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.internalTimeServiceManager(StreamTaskStateInitializerImpl.java:235)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:167)
> ... 9 more
> Caused by: java.io.IOException: position out of bounds
> at 
> org.apache.flink.runtime.state.memory.ByteStreamStateHandle$ByteStateHandleInputStream.seek(ByteStreamStateHandle.java:124)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl$KeyGroupStreamIterator.next(StreamTaskStateInitializerImpl.java:442)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl$KeyGroupStreamIterator.next(StreamTaskStateInitializerImpl.java:395)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.internalTimeServiceManager(StreamTaskStateInitializerImpl.java:228)
> ... 10 more
> {code}
>  The error is only thrown when the feedback channel has been used. 
> I have tested with the [example 
> application|https://github.com/apache/flink-statefun/blob/master/statefun-examples/statefun-flink-datastream-example/src/main/java/org/apache/flink/statefun/examples/datastream/Example.java]
>  and the error is thrown only if it is modified to actually use the feedback 
> channel. I simply modified the invoke method to sometimes forward the 
> greeting to a random name: 
> {code:java}
> @Override
> public void invoke(Context context, Object input) {
>   int seen = seenCount.updateAndGet(MyFunction::increment);
>   context.send(GREETINGS, String.format("Hello %s at the %d-th time", input, 
> seen));
>   String[] names = {"Stephan", "Igal", "Gordon", "Seth", "Marta"};
>   ThreadLocalRandom random = ThreadLocalRandom.current();
>   int index = random.nextInt(names.length);
>   final String name2 = names[index];
>   if (random.nextDouble() < 0.5) context.send(new Address(GREET, name2), 
> input);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] tzulitai commented on pull request #171: [FLINK-20009][docs] Add documentation link check to travis

2020-11-06 Thread GitBox


tzulitai commented on pull request #171:
URL: https://github.com/apache/flink-statefun/pull/171#issuecomment-723377364


   +1, LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19748) KeyGroupRangeOffsets#KeyGroupOffsetsIterator should skip key groups that don't have a defined offset

2020-11-06 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-19748.
---
Resolution: Fixed

flink/master: 3367c238ad9de0dc50735df33ff09852a3427b1b
flink/release-1.11: 12a2ade31b4ba3beb9eb3fe1d26064223fc02fec

> KeyGroupRangeOffsets#KeyGroupOffsetsIterator should skip key groups that 
> don't have a defined offset
> 
>
> Key: FLINK-19748
> URL: https://issues.apache.org/jira/browse/FLINK-19748
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> Currently, on commit the {{UnboundedFeedbackLogger}} only calls 
> {{startNewKeyGroup}} on the raw keyed stream for key groups that actually 
> have logged messages:
> https://github.com/apache/flink-statefun/blob/master/statefun-flink/statefun-flink-core/src/main/java/org/apache/flink/statefun/flink/core/logger/UnboundedFeedbackLogger.java#L102
> This means that it might skip some key groups, if a key group doesn't have 
> any logged messages.
> This doesn't conform with the expected usage of Flink's 
> {{KeyedStateCheckpointOutputStream}}, where it expects that for ALL key 
> groups within the range, {{startNewKeyGroup}} needs to be invoked.
> The reason for this is that underneath, calling {{startNewKeyGroup}} would 
> also record the starting stream offset position for the key group.
> However, when iterating through a raw keyed stream, the key group offsets 
> iterator {{KeyGroupRangeOffsets#KeyGroupOffsetsIterator}} doesn't take into 
> account that some key groups weren't written and therefore do not have 
> offsets defined, and the streams will be seeked to incorrect positions.
> Ultimately, if some key groups were skipped while writing to the raw keyed 
> stream, the following error will be thrown on restore:
> {code}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:473)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:469)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:522)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.IOException: position out of bounds
>   at 
> org.apache.flink.runtime.state.StatePartitionStreamProvider.getStream(StatePartitionStreamProvider.java:58)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.internalTimeServiceManager(StreamTaskStateInitializerImpl.java:235)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:167)
>   ... 9 more
> Caused by: java.io.IOException: position out of bounds
>   at 
> org.apache.flink.runtime.state.memory.ByteStreamStateHandle$ByteStateHandleInputStream.seek(ByteStreamStateHandle.java:124)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl$KeyGroupStreamIterator.next(StreamTaskStateInitializerImpl.java:442)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl$KeyGroupStreamIterator.next(StreamTaskStateInitializerImpl.java:395)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.internalTimeServiceManager(StreamTaskStateInitializerImpl.java:228)
>   ... 10 more
> {code}
> h2. *Solution*
> We change the {{KeyGroupRangeOffsets#KeyGroupOffsetsIterator}} in Flink to 
> skip key groups that don't have a defined offset (i.e. {{startNewKeyGroup}} 
> wasn't called for these key groups).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19741) InternalTimeServiceManager fails to restore due to corrupt reads if there are other users of raw keyed state streams

2020-11-06 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-19741.
---
Resolution: Fixed

flink/master: c151abc5bd6b4b642100cf75f8981f08535c5936
flink/release-1.11: 6675979838ade22cd6e069950bee362ba52ef747

> InternalTimeServiceManager fails to restore due to corrupt reads if there are 
> other users of raw keyed state streams
> 
>
> Key: FLINK-19741
> URL: https://issues.apache.org/jira/browse/FLINK-19741
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.9.3, 1.10.2, 1.11.2
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> h2. *Diagnosis*
> Currently, when restoring a {{InternalTimeServiceManager}}, we always attempt 
> to read from the provided raw keyed state streams (using 
> {{InternalTimerServiceSerializationProxy}}):
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/InternalTimeServiceManagerImpl.java#L117
> This is incorrect, since we don't write with the 
> {{InternalTimerServiceSerializationProxy}} if the timers do not require 
> legacy synchronous snapshots:
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/InternalTimeServiceManagerImpl.java#L192
> (we currently only require that when users use RocksDB backend + heap timers).
> Therefore, the {{InternalTimeServiceManager}} can fail to be created on 
> restore due to corrupt reads in the case where:
> * a checkpoint was taken where {{useLegacySynchronousSnapshots}} is false 
> (hence nothing was written, and the time service manager does not use the raw 
> keyed stream)
> * the raw keyed stream is used elsewhere (e.g. in the Flink application's 
> user code)
> * on restore from the checkpoint, {{InternalTimeServiceManagerImpl.create()}} 
> attempts to read from the raw keyed stream with the 
> {{InternalTimerServiceSerializationProxy}}.
> Full error stack trace (with Flink 1.11.1):
> {code}
> 2020-10-21 13:16:51
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readUTF(DataInputStream.java:609)
>   at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimerServiceSerializationProxy.read(InternalTimerServiceSerializationProxy.java:110)
>   at 
> org.apache.flink.core.io.PostVersionedIOReadableWritable.read(PostVersionedIOReadableWritable.java:76)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.restoreStateForKeyGroup(InternalTimeServiceManager.java:217)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.internalTimeServiceManager(StreamTaskStateInitializerImpl.java:234)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:167)
>   ... 9 more
> {code}
> h2. *Reproducing*
> - Have an application with any operator that uses and writes to raw keyed 
> state streams
> - Use heap backend + any timer factory or RocksDB backend + RocksDB timers
> - Take a savepoint or wait for a checkpoint, and trigger a restore
> h2. *Proposed Fix*
> The fix would be to also respect the {{useLegacySynchronousSnapshots}} flag 
> in:
> 

[GitHub] [flink] tzulitai closed pull request #13772: [FLINK-19748] Iterating key groups in raw keyed stream on restore fails if some key groups weren't written

2020-11-06 Thread GitBox


tzulitai closed pull request #13772:
URL: https://github.com/apache/flink/pull/13772


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tzulitai closed pull request #13761: [FLINK-19741] Allow AbstractStreamOperator to skip restoring timers if subclasses are using raw keyed state

2020-11-06 Thread GitBox


tzulitai closed pull request #13761:
URL: https://github.com/apache/flink/pull/13761


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19992) Integrate new orc to Hive source

2020-11-06 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-19992.

Resolution: Fixed

master (1.12): c33e1adbd401a2030d51863c940b8822f06005e5

> Integrate new orc to Hive source
> 
>
> Key: FLINK-19992
> URL: https://issues.apache.org/jira/browse/FLINK-19992
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> After introducing `OrcColumnarRowFileInputFormat`
> We need integrate it to Hive, including Hive 2+ and Hive 1.X



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] JingsongLi merged pull request #13939: [FLINK-19992][hive] Integrate new orc to Hive source

2020-11-06 Thread GitBox


JingsongLi merged pull request #13939:
URL: https://github.com/apache/flink/pull/13939


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] JingsongLi commented on pull request #13939: [FLINK-19992][hive] Integrate new orc to Hive source

2020-11-06 Thread GitBox


JingsongLi commented on pull request #13939:
URL: https://github.com/apache/flink/pull/13939#issuecomment-723376249


   E2E failed by `test_streaming_file_sink` looks like network reason which is 
unrelated to this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20037) there is a comment problem in fromValues(AbstractDataType rowType, Object... values) method of TableEnvironment

2020-11-06 Thread hailong wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227686#comment-17227686
 ] 

hailong wang commented on FLINK-20037:
--

Hi [~yandufeng], Thank you for reporting this.
I think you are right, feel free to open a PR.

> there is a comment problem in fromValues(AbstractDataType rowType, 
> Object... values)  method of TableEnvironment
> ---
>
> Key: FLINK-20037
> URL: https://issues.apache.org/jira/browse/FLINK-20037
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.11.1
>Reporter: yandufeng
>Priority: Trivial
>  Labels: starter
>
> there is a comment problem in fromValues(AbstractDataType rowType, 
> Object... values) method of TableEnvironment, i think second column name is 
> "name" not "f1". this is my first issue, if there is a problem, please 
> understand.
> * Examples:
> * {@code
> * tEnv.fromValues(
> * DataTypes.ROW(
> * DataTypes.FIELD("id", DataTypes.DECIMAL(10, 2)),
> * DataTypes.FIELD("name", DataTypes.STRING())
> * ),
> * row(1, "ABC"),
> * row(2L, "ABCDE")
> * )
> * }
> * will produce a Table with a schema as follows:
> * {@code
> * root
> * |-- id: DECIMAL(10, 2)
> * |-- f1: STRING
> * }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13954: [FLINK-19903][Table SQL / API] Metadata fields for filesystem csv format

2020-11-06 Thread GitBox


flinkbot edited a comment on pull request #13954:
URL: https://github.com/apache/flink/pull/13954#issuecomment-722706660


   
   ## CI report:
   
   * 905052e4286ce9d1ed0a7d5c7813a0f35b4fbbdc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9224)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20037) there is a comment problem in fromValues(AbstractDataType rowType, Object... values) method of TableEnvironment

2020-11-06 Thread yandufeng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yandufeng updated FLINK-20037:
--
Description: 
there is a comment problem in fromValues(AbstractDataType rowType, Object... 
values) method of TableEnvironment, i think second column name is "name" not 
"f1". this is my first issue, if there is a problem, please understand.

* Examples:
* {@code
* tEnv.fromValues(
* DataTypes.ROW(
* DataTypes.FIELD("id", DataTypes.DECIMAL(10, 2)),
* DataTypes.FIELD("name", DataTypes.STRING())
* ),
* row(1, "ABC"),
* row(2L, "ABCDE")
* )
* }
* will produce a Table with a schema as follows:
* {@code
* root
* |-- id: DECIMAL(10, 2)
* |-- f1: STRING
* }

  was:
there is a comment problem in fromValues(AbstractDataType rowType, Object... 
values) method of TableEnvironment, i think second column name is "name" not 
"f1". this is my first issue, if there is a problem, please understand.
{quote}* Examples:
* {@code
* tEnv.fromValues(
* DataTypes.ROW(
* DataTypes.FIELD("id", DataTypes.DECIMAL(10, 2)),
* DataTypes.FIELD("name", DataTypes.STRING())
* ),
* row(1, "ABC"),
* row(2L, "ABCDE")
* )
* }
* will produce a Table with a schema as follows:
* {@code
* root
* |-- id: DECIMAL(10, 2)
* |-- f1: STRING
* }
*
* For more examples see \{@link #fromValues(Object...)}.{quote}


> there is a comment problem in fromValues(AbstractDataType rowType, 
> Object... values)  method of TableEnvironment
> ---
>
> Key: FLINK-20037
> URL: https://issues.apache.org/jira/browse/FLINK-20037
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.11.1
>Reporter: yandufeng
>Priority: Trivial
>  Labels: starter
>
> there is a comment problem in fromValues(AbstractDataType rowType, 
> Object... values) method of TableEnvironment, i think second column name is 
> "name" not "f1". this is my first issue, if there is a problem, please 
> understand.
> * Examples:
> * {@code
> * tEnv.fromValues(
> * DataTypes.ROW(
> * DataTypes.FIELD("id", DataTypes.DECIMAL(10, 2)),
> * DataTypes.FIELD("name", DataTypes.STRING())
> * ),
> * row(1, "ABC"),
> * row(2L, "ABCDE")
> * )
> * }
> * will produce a Table with a schema as follows:
> * {@code
> * root
> * |-- id: DECIMAL(10, 2)
> * |-- f1: STRING
> * }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   >