[jira] [Commented] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034066#comment-17034066 ] Igor Dvorzhak commented on BEAM-9288: - Will it be better to exclude Concrypt from shaded GCS IO jar and rely on Conscrypt installed system-wide, if any? > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Major > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-6736) Upgrade gcsio dependency to 1.9.15
[ https://issues.apache.org/jira/browse/BEAM-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Dvorzhak closed BEAM-6736. --- Resolution: Fixed > Upgrade gcsio dependency to 1.9.15 > -- > > Key: BEAM-6736 > URL: https://issues.apache.org/jira/browse/BEAM-6736 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-java-core >Affects Versions: 2.10.0, 2.11.0 >Reporter: Igor Dvorzhak >Priority: Major > Fix For: 2.11.0 > > Time Spent: 40m > Remaining Estimate: 0h > > GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there > are a 1000+ of files in the folder) in > GoogleCloudStorageFileSystem#getFileInfo method. > This issue is mitigated in GCS IO 1.9.15: > [https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6697) ParquetIO Performance test is failing on (GCS filesystem)
[ https://issues.apache.org/jira/browse/BEAM-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777392#comment-16777392 ] Igor Dvorzhak commented on BEAM-6697: - GCS IO 1.9.16 with the fix was just released: https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.16 > ParquetIO Performance test is failing on (GCS filesystem) > - > > Key: BEAM-6697 > URL: https://issues.apache.org/jira/browse/BEAM-6697 > Project: Beam > Issue Type: New Feature > Components: io-java-parquet, test-failures >Reporter: Lukasz Gajowy >Priority: Blocker > Fix For: 2.11.0 > > > Relevant failure logs: > {code:java} > Caused by: java.lang.RuntimeException: > org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$BeamParquetInputFile@2de8303e > is not a Parquet file (too small length: -1) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:514) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:689) > at > org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$ReadFn.processElement(ParquetIO.java:221){code} > > Full logs can be found here: > [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/1096/console] > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6697) ParquetIO Performance test is failing on (GCS filesystem)
[ https://issues.apache.org/jira/browse/BEAM-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777332#comment-16777332 ] Igor Dvorzhak commented on BEAM-6697: - This happens is because `GoogleCloudStorageReadChannel` constructor doesn't initialize GCS metadata (includes object size) - it's initialized lazily during first read. Metadata initialized eagerly only if `GoogleCloudStorageReadChannel` created via `GoogleCloudStorage.open()` method. It's fixed [here|https://github.com/GoogleCloudPlatform/bigdata-interop/commit/8f6443bfd6ee821c5667dd2811cf3fe03167b755] and will be release in GCS connector 1.9.16 in couple hours. > ParquetIO Performance test is failing on (GCS filesystem) > - > > Key: BEAM-6697 > URL: https://issues.apache.org/jira/browse/BEAM-6697 > Project: Beam > Issue Type: New Feature > Components: io-java-parquet, test-failures >Reporter: Lukasz Gajowy >Priority: Blocker > Fix For: 2.11.0 > > > Relevant failure logs: > {code:java} > Caused by: java.lang.RuntimeException: > org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$BeamParquetInputFile@2de8303e > is not a Parquet file (too small length: -1) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:514) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:689) > at > org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$ReadFn.processElement(ParquetIO.java:221){code} > > Full logs can be found here: > [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/1096/console] > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6697) ParquetIO Performance test is failing on (GCS filesystem)
[ https://issues.apache.org/jira/browse/BEAM-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777332#comment-16777332 ] Igor Dvorzhak edited comment on BEAM-6697 at 2/25/19 10:15 PM: --- This happens because `GoogleCloudStorageReadChannel` constructor doesn't initialize GCS metadata (includes object size) - it's initialized lazily during first read. Metadata initialized eagerly only if `GoogleCloudStorageReadChannel` created via `GoogleCloudStorage.open()` method. It's fixed [here|https://github.com/GoogleCloudPlatform/bigdata-interop/commit/8f6443bfd6ee821c5667dd2811cf3fe03167b755] and will be release in GCS connector 1.9.16 in couple hours. was (Author: medb): This happens is because `GoogleCloudStorageReadChannel` constructor doesn't initialize GCS metadata (includes object size) - it's initialized lazily during first read. Metadata initialized eagerly only if `GoogleCloudStorageReadChannel` created via `GoogleCloudStorage.open()` method. It's fixed [here|https://github.com/GoogleCloudPlatform/bigdata-interop/commit/8f6443bfd6ee821c5667dd2811cf3fe03167b755] and will be release in GCS connector 1.9.16 in couple hours. > ParquetIO Performance test is failing on (GCS filesystem) > - > > Key: BEAM-6697 > URL: https://issues.apache.org/jira/browse/BEAM-6697 > Project: Beam > Issue Type: New Feature > Components: io-java-parquet, test-failures >Reporter: Lukasz Gajowy >Priority: Blocker > Fix For: 2.11.0 > > > Relevant failure logs: > {code:java} > Caused by: java.lang.RuntimeException: > org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$BeamParquetInputFile@2de8303e > is not a Parquet file (too small length: -1) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:514) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:689) > at > org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$ReadFn.processElement(ParquetIO.java:221){code} > > Full logs can be found here: > [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/1096/console] > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6736) Upgrade gcsio dependency to 1.9.15
Igor Dvorzhak created BEAM-6736: --- Summary: Upgrade gcsio dependency to 1.9.15 Key: BEAM-6736 URL: https://issues.apache.org/jira/browse/BEAM-6736 Project: Beam Issue Type: Bug Components: io-java-gcp, sdk-java-core Affects Versions: 2.10.0 Reporter: Igor Dvorzhak GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there are a 1000+ of files in the folder) in GoogleCloudStorage#getFileInfo method. This issue is mitigated in GCS IO 1.9.15: https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6736) Upgrade gcsio dependency to 1.9.15
[ https://issues.apache.org/jira/browse/BEAM-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Dvorzhak updated BEAM-6736: Description: GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there are a 1000+ of files in the folder) in GoogleCloudStorageFileSystem#getFileInfo method. This issue is mitigated in GCS IO 1.9.15: [https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15] was: GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there are a 1000+ of files in the folder) in GoogleCloudStorage#getFileInfo method. This issue is mitigated in GCS IO 1.9.15: https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15 > Upgrade gcsio dependency to 1.9.15 > -- > > Key: BEAM-6736 > URL: https://issues.apache.org/jira/browse/BEAM-6736 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-java-core >Affects Versions: 2.10.0 >Reporter: Igor Dvorzhak >Priority: Major > > GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there > are a 1000+ of files in the folder) in > GoogleCloudStorageFileSystem#getFileInfo method. > This issue is mitigated in GCS IO 1.9.15: > [https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15] -- This message was sent by Atlassian JIRA (v7.6.3#76005)