[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-29 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-487765637
 
 
   Thanks for getting back and taking a look @tillrohrmann and @shuai-xu. To 
answer some of your top level comments / questions:
   1) The flink-azure-fs-hadoop jars are written out to the opt/ directory in 
the flink-dist (based on comments in the original review). I've tested this in 
local flink jobs, I've trying to sort out some things to test this out on our 
internal hadoop cluster. 
   2) I can add some E2E tests on the lines of `test_shaded_presto_s3`. Do we 
have an azure bucket at the project level that I should use? Or should I just 
add the tests similar to the IT test and the folks who run it can fill in their 
azure details?
   3) ITCase with HTTP - Seems like they do support retrieving this information 
via their REST API 
(https://docs.microsoft.com/en-us/rest/api/storagerp/storageaccounts/getproperties).
 I can try and hook this up to the IT case to only run the HTTP tests if 
`supportsHttpsTrafficOnly` = false. 
   4) Dependency jars - If I understand correctly, some of these dependent jars 
(like hadoop-azure / azure-storage) should be part of the hadoop install right? 
Or do I need to tweak things to package them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-30 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-488112927
 
 
   @tillrohrmann do you know if there's any special setup required for the 
flink e2e tests to run on developer machines? I'm trying to run a azure_fs e2e 
test I've created and am consistently seeing the test failing due to a timeout 
as the dispatcher rest endpoint hasn't come up in 20s. I also see the same 
error when I try to run the test_batch_wordcount.sh:
   ```
   $ export 
FLINK_DIR=flink-dist/target/flink-1.9-SNAPSHOT-bin/flink-1.9-SNAPSHOT
   $ export HADOOP_CLASSPATH=...
   $ export HADOOP_CONF_DIR=...
   $ flink-end-to-end-tests/run-single-test.sh 
flink-end-to-end-tests/test-scripts/test_batch_wordcount.sh 
   Starting cluster.
   Starting standalonesession daemon on host C02T3863HF1R.
   Starting taskexecutor daemon on host C02T3863HF1R.
   Waiting for dispatcher REST endpoint to come up...
   ...
   Dispatcher REST endpoint has not started within a timeout of 20 sec
   [FAIL] Test script contains errors.
   Checking for errors...
   No errors in log files.
   Checking for exceptions...
   No exceptions in log files.
   Checking for non-empty .out files...
   Found non-empty .out files:
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in 
[jar:file:/Users/p.narang/workspace/flink/flink-dist/target/flink-1.9-SNAPSHOT-bin/flink-1.9-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in 
[jar:file:/Users/p.narang/Downloads/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in 
[jar:file:/Users/p.narang/workspace/flink/flink-dist/target/flink-1.9-SNAPSHOT-bin/flink-1.9-SNAPSHOT/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in 
[jar:file:/Users/p.narang/Downloads/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   
   [FAIL] 'flink-end-to-end-tests/test-scripts/test_batch_wordcount.sh' failed 
after 0 minutes and 26 seconds! Test exited with exit code 1 and the logs 
contained errors, exceptions or non-empty .out files
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-30 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-488128359
 
 
   P.S. @tillrohrmann / @shuai-xu - I was able to test this out on a hadoop 
cluster. I had to copy the following jars to my lib dir to get it to work - 
`azure-storage-2.0.0.jar`,  `flink-azure-fs-hadoop-1.6-SNAPSHOT.jar`,  
`hadoop-azure-*.jar`
   I'll dig into how the other filesystems are achieving this in their uber jar 
so that we can also generate one flink-azure-fs-hadoop jar which includes the 
azure-storage and hadoop-azure deps. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-30 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-488195295
 
 
   Seems like the missing ingredient was this commit: 
https://github.com/apache/flink/pull/8117/commits/8205a210b748110944ac27a3590b995d0c942a42
   This was missing when shuai tested and when I first built my 1.6 jars. 
   Unfortunately though, I am running into some shading issues when I try to 
use it on my cluster so I'll dig into them over the coming days:
   ```
   Caused by: java.lang.NoClassDefFoundError: 
org/apache/flink/fs/shaded/hadoop3/org/apache/hadoop/hdfs/HdfsConfiguration
at 
org.apache.flink.fs.azurefs.AzureFileSystem.createInitializedAzureFS(AzureFileSystem.java:48)
   ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-05-03 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-489094928
 
 
   Yeah let me check. I kicked off a `mvn clean install -DskipTests` on my 
machine before submitting and that seems to have gone through ok. I'll try and 
repro what the travis build is doing and see if I can trigger the failure. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-05-03 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-489140860
 
 
   Ok I think I didn't catch the issue as I was using mvn 3.5 and it seems that 
the shade plugin behaves a bit differently on it. When I build with 3.2.5 (what 
we default to in the project), I was able to trigger the failure. 
   Seems to be due to the flink-s3-fs-base module filtering all the classes in 
the util package (where I had placed the HadoopConfigLoader). I'm not really 
sure on the context of the original change, I've updated to only filter the 
HadoopUtils class (no other classes in that package). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-05-07 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-490264770
 
 
   @tillrohrmann - I'm not really sure I understand why we need the 
flink-hadoop-fs-shaded module. Is the whole point of that to reduce the size of 
the flink-azure-fs jar? Or something else? Moreover, is it essential to add 
that right now? We've iterated a fair bit on this PR already and if there is 
something that can be addressed in a future review, I'd prefer doing that 
rather than trying to tackle everything in this one. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-05-09 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-491016295
 
 
   Thanks @tillrohrmann. I'm happy to put in a follow up on the shading stuff. 
Can sit down with that next week and put in a fresh PR on that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-08 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-480938035
 
 
   @Myasuka - thanks for taking a look. Updated to add this to opt.xml and some 
docs as well. Let me know if you think I should expand / clarify the docs or 
add some documentation to a specific location apart from this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-04-12 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-482722850
 
 
   @Myasuka - did you get a chance to take a look at the rework? Anything else 
to address or can we merge this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support for AzureFS

2019-05-23 Thread GitBox
piyushnarang commented on issue #8117: [FLINK-12115] [filesystems]: Add support 
for AzureFS
URL: https://github.com/apache/flink/pull/8117#issuecomment-495319623
 
 
   @tillrohrmann - did you get a chance to take a stab at this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services