[ https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=177431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-177431 ]
ASF GitHub Bot logged work on BEAM-5959: ---------------------------------------- Author: ASF GitHub Bot Created on: 20/Dec/18 11:29 Start Date: 20/Dec/18 11:29 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #7266: [BEAM-5959] Add performance testing for writing many files URL: https://github.com/apache/beam/pull/7266#discussion_r243233574 ########## File path: sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java ########## @@ -77,12 +89,40 @@ public static void setup() { numberOfTextLines = options.getNumberOfRecords(); filenamePrefix = appendTimestampSuffix(options.getFilenamePrefix()); compressionType = Compression.valueOf(options.getCompressionType()); + numShards = options.getNumShards(); + bigQueryDataset = options.getBigQueryDataset(); + bigQueryTable = options.getBigQueryTable(); + } + + private void publishGcsResults(PipelineResult result) { + MetricsReader metricsReader = + new MetricsReader(result, "org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem"); + long numCopies = metricsReader.getCounterMetric("num_copies"); + long copyTimeMsec = metricsReader.getCounterMetric("copy_time_msec"); + if (numCopies < 0 || copyTimeMsec < 0) { + return; + } + long copiesPerSec = (long) (numCopies / (copyTimeMsec / 1e3)); + LOG.info("GCS copies / sec: {}", copiesPerSec); + + if (bigQueryDataset != null && bigQueryTable != null) { + Timestamp timestamp = Timestamp.now(); Review comment: Huh! So far I always used Long millis for this and passed it to BQ as the timestamp. it had a flaw that I needed to divide by 1000 since BQ accepts seconds only. Now I know there is `Timestamp` type. Thanks for showing me this. :D ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 177431) Time Spent: 11h 40m (was: 11.5h) > Add Cloud KMS support to GCS copies > ----------------------------------- > > Key: BEAM-5959 > URL: https://issues.apache.org/jira/browse/BEAM-5959 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-py-core > Reporter: Udi Meiri > Assignee: Udi Meiri > Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > Beam SDK currently uses the CopyTo GCS API call, which doesn't support > copying objects that Customer Managed Encryption Keys (CMEK). > CMEKs are managed in Cloud KMS. > Items (for Java and Python SDKs): > - Update clients to versions that support KMS keys. > - Change copyTo API calls to use rewriteTo (Python - directly, Java - > possibly convert copyTo API call to use client library) > - Add unit tests. > - Add basic tests (DirectRunner and GCS buckets with CMEK). -- This message was sent by Atlassian JIRA (v7.6.3#76005)