[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=152056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152056
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 07/Oct/18 11:33
Start Date: 07/Oct/18 11:33
Worklog Time Spent: 10m 
  Work Description: Arqu commented on issue #5112: [BEAM-4049] Improve 
CassandraIO write throughput by performing async queries
URL: https://github.com/apache/beam/pull/5112#issuecomment-427646175
 
 
   I've currently just fixed the beam-sdks-java-io-cassandra to 2.3.0 while 
keeping the rest to 2.5.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152056)
Time Spent: 7h 50m  (was: 7h 40m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Major
>  Labels: performance
> Fix For: 2.5.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152057
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 11:42
Start Date: 07/Oct/18 11:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6416: 
[BEAM-5105] Better parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416#discussion_r223210012
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java
 ##
 @@ -757,7 +757,7 @@ public void testWriteUnknown() throws Exception {
 .withoutValidation());
 
 thrown.expect(RuntimeException.class);
-thrown.expectMessage("Failed to create load job");
+thrown.expectMessage("Failed to create job");
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152057)
Time Spent: 1h 20m  (was: 1h 10m)

> Move load job poll to finishBundle() method to better parallelize execution
> ---
>
> Key: BEAM-5105
> URL: https://issues.apache.org/jira/browse/BEAM-5105
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It appears that when we write to BigQuery using WriteTablesDoFn we start a 
> load job and wait for that job to finish.
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L318]
>  
> In cases where we are trying to write a PCollection of tables (for example, 
> when user use dynamic destinations feature) this relies on dynamic work 
> rebalancing to parallellize execution of load jobs. If the runner does not 
> support dynamic work rebalancing or does not execute dynamic work rebalancing 
> from some reason this could have significant performance drawbacks. For 
> example, scheduling times for load jobs will add up.
>  
> A better approach might be to start load jobs at process() method but wait 
> for all load jobs to finish at finishBundle() method. This will parallelize 
> any overheads as well as job execution (assuming more than one job is 
> schedule by BQ.).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152061
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 11:42
Start Date: 07/Oct/18 11:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6416: 
[BEAM-5105] Better parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416#discussion_r223210023
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java
 ##
 @@ -88,148 +102,128 @@ public void processElement(ProcessContext c) throws 
Exception {
 }
 for (Map.Entry> entry : 
tempTables.asMap().entrySet()) {
   // Process each destination table.
-  writeRename(entry.getKey(), entry.getValue(), c);
+  // Do not copy if no temp tables are provided.
+  if (!entry.getValue().isEmpty()) {
+pendingJobs.add(startWriteRename(entry.getKey(), entry.getValue(), c));
+  }
 }
   }
 
-  private void writeRename(
+  @FinishBundle
+  public void finishBundle(FinishBundleContext c) throws Exception {
+DatasetService datasetService =
+
bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class));
+PendingJobManager jobManager = new PendingJobManager();
+for (PendingJobData pendingJob : pendingJobs) {
+  jobManager.addPendingJob(
+  pendingJob.retryJob,
+  j -> {
+try {
+  if (pendingJob.tableDestination.getTableDescription() != null) {
+TableReference ref = 
pendingJob.tableDestination.getTableReference();
+datasetService.patchTableDescription(
+ref.clone()
+
.setTableId(BigQueryHelpers.stripPartitionDecorator(ref.getTableId())),
+pendingJob.tableDestination.getTableDescription());
+  }
+  removeTemporaryTables(datasetService, pendingJob.tempTables);
+  return null;
 
 Review comment:
   fitto


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152061)
Time Spent: 2h  (was: 1h 50m)

> Move load job poll to finishBundle() method to better parallelize execution
> ---
>
> Key: BEAM-5105
> URL: https://issues.apache.org/jira/browse/BEAM-5105
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> It appears that when we write to BigQuery using WriteTablesDoFn we start a 
> load job and wait for that job to finish.
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L318]
>  
> In cases where we are trying to write a PCollection of tables (for example, 
> when user use dynamic destinations feature) this relies on dynamic work 
> rebalancing to parallellize execution of load jobs. If the runner does not 
> support dynamic work rebalancing or does not execute dynamic work rebalancing 
> from some reason this could have significant performance drawbacks. For 
> example, scheduling times for load jobs will add up.
>  
> A better approach might be to start load jobs at process() method but wait 
> for all load jobs to finish at finishBundle() method. This will parallelize 
> any overheads as well as job execution (assuming more than one job is 
> schedule by BQ.).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152058
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 11:42
Start Date: 07/Oct/18 11:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6416: 
[BEAM-5105] Better parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416#discussion_r223210015
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
 ##
 @@ -79,24 +86,202 @@ public RetryJobIdResult(RetryJobId jobId, boolean 
shouldRetry) {
 }
   }
 
+  // A class that waits for pending jobs, retrying them according to policy if 
they fail.
+  static class PendingJobManager {
+private static class JobInfo {
+  private final PendingJob pendingJob;
+  @Nullable private final SerializableFunction 
onSuccess;
+
+  public JobInfo(PendingJob pendingJob, SerializableFunction onSuccess) {
+this.pendingJob = pendingJob;
+this.onSuccess = onSuccess;
+  }
+}
+
+private List pendingJobs = Lists.newArrayList();
+
+// Add a pending job and a function to call when the job has completed 
successfully.
+PendingJobManager addPendingJob(
+PendingJob pendingJob, @Nullable SerializableFunction onSuccess) {
+  this.pendingJobs.add(new JobInfo(pendingJob, onSuccess));
+  return this;
+}
+
+void waitForDone() throws Exception {
+  BackOff backoff =
+  BackOffAdapter.toGcpBackOff(
+  FluentBackoff.DEFAULT
+  .withMaxRetries(Integer.MAX_VALUE)
+  .withInitialBackoff(Duration.standardSeconds(1))
+  .withMaxBackoff(Duration.standardMinutes(1))
+  .backoff());
+  Sleeper sleeper = Sleeper.DEFAULT;
+  while (!pendingJobs.isEmpty()) {
+List retryJobs = Lists.newArrayList();
+for (JobInfo jobInfo : pendingJobs) {
+  if (jobInfo.pendingJob.pollJob()) {
+// Job has completed successfully.
+Exception e = jobInfo.onSuccess.apply(null);
+if (e != null) {
+  throw e;
+}
+  } else {
+// Job failed, start it again. If it has hit the maximum number of 
retries then runJob
+// will throw an exception.
+jobInfo.pendingJob.runJob();
+retryJobs.add(jobInfo);
+  }
+}
+pendingJobs = retryJobs;
+if (!pendingJobs.isEmpty()) {
+  // Sleep before retrying.
+  nextBackOff(sleeper, backoff);
+}
+  }
+}
+
+/** Identical to {@link BackOffUtils#next} but without checked 
IOException. */
+private static boolean nextBackOff(Sleeper sleeper, BackOff backoff)
+throws InterruptedException {
+  try {
+return BackOffUtils.next(sleeper, backoff);
+  } catch (IOException e) {
+throw new RuntimeException(e);
+  }
+}
+  }
+
+  static class PendingJob {
+private final SerializableFunction executeJob;
+private final SerializableFunction pollJob;
+private final String projectId;
+private final String bqLocation;
+private final JobService jobService;
+private final int maxRetries;
+private int currentAttempt;
+RetryJobId currentJobId;
+Job lastJobAttempted;
+boolean started;
+
+PendingJob(
+SerializableFunction executeJob,
+SerializableFunction pollJob,
+String projectId,
+String bqLocation,
+JobService jobService,
+int maxRetries,
+String jobIdPrefix) {
+  this.executeJob = executeJob;
+  this.pollJob = pollJob;
+  this.projectId = projectId;
+  this.bqLocation = bqLocation;
+  this.jobService = jobService;
+  this.maxRetries = maxRetries;
+  this.currentAttempt = 0;
+  currentJobId = new RetryJobId(jobIdPrefix, 0);
+  this.started = false;
+}
+
+// Run the job.
+void runJob() throws InterruptedException, IOException {
+  ++currentAttempt;
+  if (!shouldRetry()) {
+throw new RuntimeException(
+String.format(
+"Failed to create job with prefix %s, "
++ "reached max retries: %d, last failed job: %s.",
+currentJobId.getJobIdPrefix(),
+maxRetries,
+BigQueryHelpers.jobToPrettyString(lastJobAttempted)));
+  }
+
+  try {
+executeJob.apply(currentJobId);
+  } catch (RuntimeException e) {
+LOG.warn("Job {} failed with {}", currentJobId.getJobId(), e);
+// It's possible that the job actually made it to BQ even though we 
got a failure here.
+// For example, the response from BQ may h

[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152059
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 11:42
Start Date: 07/Oct/18 11:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6416: 
[BEAM-5105] Better parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416#discussion_r223210018
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
 ##
 @@ -79,24 +86,202 @@ public RetryJobIdResult(RetryJobId jobId, boolean 
shouldRetry) {
 }
   }
 
+  // A class that waits for pending jobs, retrying them according to policy if 
they fail.
+  static class PendingJobManager {
+private static class JobInfo {
+  private final PendingJob pendingJob;
+  @Nullable private final SerializableFunction 
onSuccess;
+
+  public JobInfo(PendingJob pendingJob, SerializableFunction onSuccess) {
+this.pendingJob = pendingJob;
+this.onSuccess = onSuccess;
+  }
+}
+
+private List pendingJobs = Lists.newArrayList();
+
+// Add a pending job and a function to call when the job has completed 
successfully.
+PendingJobManager addPendingJob(
+PendingJob pendingJob, @Nullable SerializableFunction onSuccess) {
+  this.pendingJobs.add(new JobInfo(pendingJob, onSuccess));
+  return this;
+}
+
+void waitForDone() throws Exception {
+  BackOff backoff =
+  BackOffAdapter.toGcpBackOff(
+  FluentBackoff.DEFAULT
+  .withMaxRetries(Integer.MAX_VALUE)
+  .withInitialBackoff(Duration.standardSeconds(1))
+  .withMaxBackoff(Duration.standardMinutes(1))
+  .backoff());
+  Sleeper sleeper = Sleeper.DEFAULT;
+  while (!pendingJobs.isEmpty()) {
+List retryJobs = Lists.newArrayList();
+for (JobInfo jobInfo : pendingJobs) {
+  if (jobInfo.pendingJob.pollJob()) {
+// Job has completed successfully.
+Exception e = jobInfo.onSuccess.apply(null);
+if (e != null) {
+  throw e;
+}
+  } else {
+// Job failed, start it again. If it has hit the maximum number of 
retries then runJob
+// will throw an exception.
+jobInfo.pendingJob.runJob();
+retryJobs.add(jobInfo);
 
 Review comment:
   good catch. fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152059)
Time Spent: 1h 40m  (was: 1.5h)

> Move load job poll to finishBundle() method to better parallelize execution
> ---
>
> Key: BEAM-5105
> URL: https://issues.apache.org/jira/browse/BEAM-5105
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> It appears that when we write to BigQuery using WriteTablesDoFn we start a 
> load job and wait for that job to finish.
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L318]
>  
> In cases where we are trying to write a PCollection of tables (for example, 
> when user use dynamic destinations feature) this relies on dynamic work 
> rebalancing to parallellize execution of load jobs. If the runner does not 
> support dynamic work rebalancing or does not execute dynamic work rebalancing 
> from some reason this could have significant performance drawbacks. For 
> example, scheduling times for load jobs will add up.
>  
> A better approach might be to start load jobs at process() method but wait 
> for all load jobs to finish at finishBundle() method. This will parallelize 
> any overheads as well as job execution (assuming more than one job is 
> schedule by BQ.).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152060
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 11:42
Start Date: 07/Oct/18 11:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6416: 
[BEAM-5105] Better parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416#discussion_r223210021
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java
 ##
 @@ -166,22 +184,64 @@ public void processElement(ProcessContext c) throws 
Exception {
   (c.pane().getIndex() == 0) ? firstPaneWriteDisposition : 
WriteDisposition.WRITE_APPEND;
   CreateDisposition createDisposition =
   (c.pane().getIndex() == 0) ? firstPaneCreateDisposition : 
CreateDisposition.CREATE_NEVER;
-  load(
-  
bqServices.getJobService(c.getPipelineOptions().as(BigQueryOptions.class)),
-  
bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)),
-  jobIdPrefix,
-  tableReference,
-  tableDestination.getTimePartitioning(),
-  tableSchema,
-  partitionFiles,
-  writeDisposition,
-  createDisposition,
-  tableDestination.getTableDescription());
-  c.output(
-  mainOutputTag, KV.of(tableDestination, 
BigQueryHelpers.toJsonString(tableReference)));
-  for (String file : partitionFiles) {
-c.output(temporaryFilesTag, file);
+
+  BigQueryHelpers.PendingJob retryJob =
+  startLoad(
+  
bqServices.getJobService(c.getPipelineOptions().as(BigQueryOptions.class)),
+  
bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class)),
+  jobIdPrefix,
+  tableReference,
+  tableDestination.getTimePartitioning(),
+  tableSchema,
+  partitionFiles,
+  writeDisposition,
+  createDisposition);
+  pendingJobs.add(
+  new PendingJobData(window, retryJob, partitionFiles, 
tableDestination, tableReference));
+}
+
+@FinishBundle
+public void finishBundle(FinishBundleContext c) throws Exception {
+  DatasetService datasetService =
+  
bqServices.getDatasetService(c.getPipelineOptions().as(BigQueryOptions.class));
+
+  PendingJobManager jobManager = new PendingJobManager();
+  for (PendingJobData pendingJob : pendingJobs) {
+jobManager =
+jobManager.addPendingJob(
+pendingJob.retryJob,
+// Lambda called when the job is done.
+j -> {
+  try {
+if (pendingJob.tableDestination.getTableDescription() != 
null) {
+  TableReference ref = pendingJob.tableReference;
+  datasetService.patchTableDescription(
+  ref.clone()
+  .setTableId(
+  
BigQueryHelpers.stripPartitionDecorator(ref.getTableId())),
+  pendingJob.tableDestination.getTableDescription());
+}
+c.output(
+mainOutputTag,
+KV.of(
+pendingJob.tableDestination,
+
BigQueryHelpers.toJsonString(pendingJob.tableReference)),
+pendingJob.window.maxTimestamp(),
+pendingJob.window);
+for (String file : pendingJob.partitionFiles) {
+  c.output(
+  temporaryFilesTag,
+  file,
+  pendingJob.window.maxTimestamp(),
+  pendingJob.window);
+}
+return null;
 
 Review comment:
   function needs to return something.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152060)
Time Spent: 1h 50m  (was: 1h 40m)

> Move load job poll to finishBundle() method to better parallelize execution
> ---
>
> Key: BEAM-5105
> URL: https://issues.apache.org/jira/browse/BEAM-5105
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara

Build failed in Jenkins: beam_PostCommit_Website_Publish #124

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 7.65 KB...]
  No history is available.
:buildSrc:jar (Thread[Task worker for ':buildSrc' Thread 3,5,main]) completed. 
Took 0.123 secs.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 1.468 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.014 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.045 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.003 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.002 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 2,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Th

Build failed in Jenkins: beam_PreCommit_Website_Cron #150

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 166.69 KB...]
  *  External link http://images/logos/runners/spark.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/go.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/java.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/python.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/scala.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/downloads/index.html
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/index.html
  *  External link http://get-started/beam-overview failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://get-started/mobile-gaming-example failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/mobile-gaming-example/index.html
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-basic.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-event-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-proc-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Somet

Build failed in Jenkins: beam_PreCommit_Website_Stage_GCS_Cron #12

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 24.99 KB...]
Successfully started process 'command 'docker''
Sending build context to Docker daemon  26.04MB
Step 1/7 : FROM ruby:2.5
2.5: Pulling from library/ruby
05d1a5232b46: Already exists
5cee356eda6b: Pulling fs layer
89d3385f0fd3: Pulling fs layer
80ae6b477848: Pulling fs layer
28bdf9e584cc: Pulling fs layer
bdeb28e714e4: Pulling fs layer
11cda9982b82: Pulling fs layer
33aaff5a89b2: Pulling fs layer
28bdf9e584cc: Waiting
bdeb28e714e4: Waiting
11cda9982b82: Waiting
89d3385f0fd3: Verifying Checksum
89d3385f0fd3: Download complete
5cee356eda6b: Download complete
bdeb28e714e4: Verifying Checksum
bdeb28e714e4: Download complete
80ae6b477848: Verifying Checksum
80ae6b477848: Download complete
33aaff5a89b2: Verifying Checksum
33aaff5a89b2: Download complete
5cee356eda6b: Pull complete
11cda9982b82: Verifying Checksum
11cda9982b82: Download complete
89d3385f0fd3: Pull complete
28bdf9e584cc: Verifying Checksum
28bdf9e584cc: Download complete
80ae6b477848: Pull complete
28bdf9e584cc: Pull complete
bdeb28e714e4: Pull complete
11cda9982b82: Pull complete
33aaff5a89b2: Pull complete
Digest: sha256:99a166f9b3bb87d53d377e5484799f37d486c37fb8c98df16d3e553b4f886048
Status: Downloaded newer image for ruby:2.5
 ---> 8e2b5b80415f
Step 2/7 : WORKDIR /ruby
 ---> d3f6fc10573a
Removing intermediate container c5af5ca52f1f
Step 3/7 : RUN gem install bundler
 ---> Running in fe322c5b758b
Successfully installed bundler-1.16.6
1 gem installed
 ---> 90a4b4948e76
Removing intermediate container fe322c5b758b
Step 4/7 : ADD Gemfile Gemfile.lock /ruby/
 ---> 3659d1337464
Removing intermediate container 7bde4c263a2d
Step 5/7 : RUN bundle install --deployment --path $GEM_HOME
 ---> Running in 3a16944c563d
Fetching gem metadata from https://rubygems.org/.
Fetching rake 12.3.0
Installing rake 12.3.0
Fetching concurrent-ruby 1.0.5
Installing concurrent-ruby 1.0.5
Fetching i18n 0.9.5
Installing i18n 0.9.5
Fetching minitest 5.11.3
Installing minitest 5.11.3
Fetching thread_safe 0.3.6
Installing thread_safe 0.3.6
Fetching tzinfo 1.2.5
Installing tzinfo 1.2.5
Fetching activesupport 4.2.10
Installing activesupport 4.2.10
Fetching public_suffix 3.0.2
Installing public_suffix 3.0.2
Fetching addressable 2.5.2
Installing addressable 2.5.2
Using bundler 1.16.6
Fetching colorator 1.1.0
Installing colorator 1.1.0
Fetching colorize 0.8.1
Installing colorize 0.8.1
Fetching ffi 1.9.21
Installing ffi 1.9.21 with native extensions
Fetching ethon 0.11.0
Installing ethon 0.11.0
Fetching forwardable-extended 2.6.0
Installing forwardable-extended 2.6.0
Fetching mercenary 0.3.6
Installing mercenary 0.3.6
Fetching mini_portile2 2.3.0
Installing mini_portile2 2.3.0
Fetching nokogiri 1.8.2
Installing nokogiri 1.8.2 with native extensions
Fetching parallel 1.12.1
Installing parallel 1.12.1
Fetching typhoeus 1.3.0
Installing typhoeus 1.3.0
Fetching yell 2.0.7
Installing yell 2.0.7
Fetching html-proofer 3.8.0
Installing html-proofer 3.8.0
Fetching rb-fsevent 0.10.2
Installing rb-fsevent 0.10.2
Fetching rb-inotify 0.9.10
Installing rb-inotify 0.9.10
Fetching sass-listen 4.0.0
Installing sass-listen 4.0.0
Fetching sass 3.5.5
Installing sass 3.5.5
Fetching jekyll-sass-converter 1.5.2
Installing jekyll-sass-converter 1.5.2
Fetching ruby_dep 1.5.0
Installing ruby_dep 1.5.0
Fetching listen 3.1.5
Installing listen 3.1.5
Fetching jekyll-watch 1.5.1
Installing jekyll-watch 1.5.1
Fetching kramdown 1.16.2
Installing kramdown 1.16.2
Fetching liquid 3.0.6
Installing liquid 3.0.6
Fetching pathutil 0.16.1
Installing pathutil 0.16.1
Fetching rouge 1.11.1
Installing rouge 1.11.1
Fetching safe_yaml 1.0.4
Installing safe_yaml 1.0.4
Fetching jekyll 3.2.0
Installing jekyll 3.2.0
Fetching jekyll-redirect-from 0.11.0
Installing jekyll-redirect-from 0.11.0
Fetching jekyll_github_sample 0.3.1
Installing jekyll_github_sample 0.3.1
Bundle complete! 7 Gemfile dependencies, 38 gems now installed.
Bundled gems are installed into `/usr/local/bundle`
 ---> 1286b0ca79dd
Removing intermediate container 3a16944c563d
Step 6/7 : ENV LC_ALL C.UTF-8
 ---> Running in d0dba0fb4ed5
 ---> 6693a6ef3baf
Removing intermediate container d0dba0fb4ed5
Step 7/7 : CMD sleep 3600
 ---> Running in 966b823c3b87
 ---> e3e41a8b72eb
Removing intermediate container 966b823c3b87
Successfully built e3e41a8b72eb
Successfully tagged beam-website:latest
:beam-website:buildDockerImage (Thread[Task worker for ':',5,main]) completed. 
Took 1 mins 45.287 secs.
:beam-website:createDockerContainer (Thread[Task worker for ':',5,main]) 
started.

> Task :beam-website:createDockerContainer
Caching disabled for task ':beam-website:createDockerContainer': Caching has 
not been enabled for the task
Task ':beam-website:createDockerContainer' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
Starting process 'command '/bin/bash''. Working dir

Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #264

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 51.04 MB...]
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (3/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for 
GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (3/16) 
(4fcb7ecd958e54becdbef63957c1acd2).
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (10/16) (9198d38b18e61d6d063702bed9830375) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (10/16) 
(9198d38b18e61d6d063702bed9830375).
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (12/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem 
streams are closed for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (12/16) 
(7dc1082ceea4c048e84fbb7793181b5b) [FINISHED]
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (3/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem 
streams are closed for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (3/16) 
(4fcb7ecd958e54becdbef63957c1acd2) [FINISHED]
[ToKeyedWorkItem (8/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (8/16) (3ed98a2fbc1b4cac069a49c3a1a9137b) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (8/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (8/16) 
(3ed98a2fbc1b4cac069a49c3a1a9137b).
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (10/16) 
(9198d38b18e61d6d063702bed9830375) [FINISHED]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
7dc1082ceea4c048e84fbb7793181b5b.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(7dafaa7c4bcdb9a2a8450d2a82879220) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (8/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (8/16) 
(3ed98a2fbc1b4cac069a49c3a1a9137b) [FINISHED]
[ToKeyedWorkItem (6/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (6/16) (d58959dd0ae9a6475b6c925af3269cb4) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (6/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (6/16) 
(d58959dd0ae9a6475b6c925af3269cb4).
[ToKeyedWorkItem (3/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (3/16) (f3bde14a7c3fa44c502906a46efd6cba) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (3/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (3/16) 
(f3bde14a7c3fa44c502906a46efd6cba).
[ToKeyedWorkItem (6/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (6/16) 
(d58959dd0ae9a6475b6c925af3269cb4) [FINISHED]
[ToKeyedWorkItem (3/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (3/16) 
(f3bde14a7c3fa44c502906a46efd6cba) [FINISHED]
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (16/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem 
streams are closed for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (16/16) 
(55db2d2299ede11aadb84145e98fc839) [FINISHED]
[ToKeyedWorkItem (9/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (9/16) (428cbc58f3f315615aaa0ffedab12f80) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (9/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (9/16) 
(428cbc58f3f315615aaa0ffedab12f80).
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (15/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem 
streams are closed for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (15/16) 
(6db44a250df4f82d07d8781f387daf09) [FINISHED]
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (2/16) 
(06a5f0b306574a8d4468188fac57b31a) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager fo

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=152062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152062
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 07/Oct/18 12:10
Start Date: 07/Oct/18 12:10
Worklog Time Spent: 10m 
  Work Description: script3r commented on issue #5112: [BEAM-4049] Improve 
CassandraIO write throughput by performing async queries
URL: https://github.com/apache/beam/pull/5112#issuecomment-427648485
 
 
   Yeah, I can confirm that the issue is persists and that it is unrelated to
   version mismatch.
   
   I also rolled back to 2.3.0 and it works. I suspect the issue is associated
   with this commit removing listenable future from guava during packaging.
   
   I’ll take a stab at fixing this issue within this week if the committer
   doesn’t get to it by then.
   
   On Sun, Oct 7, 2018 at 7:33 AM Asmir Avdicevic 
   wrote:
   
   > I've currently just fixed the beam-sdks-java-io-cassandra to 2.3.0 while
   > keeping the rest to 2.5.0
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152062)
Time Spent: 8h  (was: 7h 50m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Major
>  Labels: performance
> Fix For: 2.5.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=152063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152063
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 07/Oct/18 12:13
Start Date: 07/Oct/18 12:13
Worklog Time Spent: 10m 
  Work Description: adejanovski commented on issue #5112: [BEAM-4049] 
Improve CassandraIO write throughput by performing async queries
URL: https://github.com/apache/beam/pull/5112#issuecomment-427648627
 
 
   How was the 2.5.0 packages built? With Maven or Gradle?
   In this PR, I used a method from the datastax Java driver which relies on 
Guava's ListenableFuture class. Since beam's pom relocated all of Guava into a 
custom package, it gave the error you mentioned.
   As a workaround, I added an exception in the pom file to avoid relocating 
ListenableFuture.
   I think I understood the project was moving to Gradle and as I didn't touch 
the Gradle build, ListenableFuture might still be relocated there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152063)
Time Spent: 8h 10m  (was: 8h)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Major
>  Labels: performance
> Fix For: 2.5.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Py_VR_Dataflow #1281

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 76.18 KB...]
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
Collecting mock (from -r postcommit_requirements.txt (line 2))
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
Collecting mock (from -r postcommit_requirements.txt (line 2))
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded 
/tmp/dataflow-requirem

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=152064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152064
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 07/Oct/18 12:34
Start Date: 07/Oct/18 12:34
Worklog Time Spent: 10m 
  Work Description: Arqu commented on issue #5112: [BEAM-4049] Improve 
CassandraIO write throughput by performing async queries
URL: https://github.com/apache/beam/pull/5112#issuecomment-427649912
 
 
   I'm actually relying on Maven/pom. Tried also adding the exclusion to my own 
pom file which resulted in the same situation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152064)
Time Spent: 8h 20m  (was: 8h 10m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Major
>  Labels: performance
> Fix For: 2.5.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5620) Some tests use assertItemsEqual method, not available in Python 3

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5620?focusedWorklogId=152065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152065
 ]

ASF GitHub Bot logged work on BEAM-5620:


Author: ASF GitHub Bot
Created on: 07/Oct/18 12:44
Start Date: 07/Oct/18 12:44
Worklog Time Spent: 10m 
  Work Description: Fematich opened a new pull request #6599: [BEAM-5620] 
rename assertItemsEqual to assertCounEqual for PY3 compatibility
URL: https://github.com/apache/beam/pull/6599
 
 
   Replaced all `assertItemsEqual` by assigning it to `assertCountEqual` on 
Python 2, as discussed in [BEAM-5620] and illustrated in 
https://github.com/apache/beam/blob/c34c367f5da6f9bef8a46471195470923a201af9/sdks/python/apache_beam/coders/coders_test_common.py#L61
   
   I have added  `def setUpClass(cls)` if it wasn't available (only `setUp` was 
in most cases), so that it shouldn't be rerun for every test of the class.
   
   This is is part of a series of PRs with goal to make Apache Beam PY3 
compatible. The proposal with the outlined approach has been documented here: 
https://s.apache.org/beam-python-3
   
   @tvalentyn @manuzhang @charlesccychen @aaltay @splovyt @Juta
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152065)
Time Spent: 10m
Remaining Estimate: 0h

> Some tests use assertItemsEqual method, not available in Python 3
> -
>
> Key: BEAM-5620
> URL: https://issues.apache.org/jira/browse/BEAM-5620
> Project: Beam
>  Issue Type: Sub-t

[jira] [Commented] (BEAM-5627) Several IO tests fail in Python 3 when accessing a temporary file with TypeError: a bytes-like object is required, not 'str'

2018-10-07 Thread Matthias Feys (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641075#comment-16641075
 ] 

Matthias Feys commented on BEAM-5627:
-

[~rakeshkumar] & [~splovyt]: this is caused by the test that create inputs with 
`str()`, which behaves differently between Py3 and Py2. See table 
below([source|https://timothybramlett.com/Strings_Bytes_and_Unicode_in_Python_2_and_3.html]):
 
|Python 2|this string literal is called a "str" object but its stored as bytes. 
If you prefix it with "u" you get a "unicode" object which is stored as Unicode 
code points.|
|Python 3|this string literal is a "str" object that stores Unicode code points 
by default. You can prefix it with "b" to get a bytes object or use .encode.|

Adding .encode('utf-8') will resolve these issues.

> Several IO tests fail in Python 3  when accessing a temporary file with  
> TypeError: a bytes-like object is required, not 'str'
> --
>
> Key: BEAM-5627
> URL: https://issues.apache.org/jira/browse/BEAM-5627
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Rakesh Kumar
>Priority: Major
>
> ERROR: test_split_at_fraction_exhaustive 
> (apache_beam.io.source_test_utils_test.SourceTestUtilsTest)
>  --
>  Traceback (most recent call last):
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py",
>  line 120, in test_split_at_fraction_exhaustive
>  source = self._create_source(data)
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py",
>  line 43, in _create_source
>  source = LineSource(self._create_file_with_data(data))
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py",
>  line 35, in _create_file_with_data
>  f.write(line + '\n')
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/tempfile.py",
>  line 622, in func_wrapper
>  return func(*args, **kwargs)
> TypeError: a bytes-like object is required, not 'str'
> Also similar:
> ==
>  ERROR: test_file_sink_writing 
> (apache_beam.io.filebasedsink_test.TestFileBasedSink)
> --
> Traceback (most recent call last):
>File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink_test.py", line 121, in 
> test_file_sink_writing
>   init_token, writer_results = self._common_init(sink)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink_test.py", line 103, in _common_init
>   writer1 = sink.open_writer(init_token, '1')
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/options/value_provider.py", line 133, in _f
>   return fnc(self, *args, **kwargs)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink.py", line 185, in open_writer
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink.py", line 385, in __init__
>   self.temp_handle = self.sink.open(temp_shard_path)
> File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/   
>apache_beam/io/filebasedsink_test.py", line 82, in open
>   file_handle.write('[start]')
>   TypeError: a bytes-like object is required, not 'str'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152075
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 14:59
Start Date: 07/Oct/18 14:59
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #6416: [BEAM-5105] 
Better parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416#issuecomment-427659775
 
 
   Doesn't look like we have large scale jobs in Beam. I think we have large 
scale jobs internally.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152075)
Time Spent: 2h 10m  (was: 2h)

> Move load job poll to finishBundle() method to better parallelize execution
> ---
>
> Key: BEAM-5105
> URL: https://issues.apache.org/jira/browse/BEAM-5105
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> It appears that when we write to BigQuery using WriteTablesDoFn we start a 
> load job and wait for that job to finish.
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L318]
>  
> In cases where we are trying to write a PCollection of tables (for example, 
> when user use dynamic destinations feature) this relies on dynamic work 
> rebalancing to parallellize execution of load jobs. If the runner does not 
> support dynamic work rebalancing or does not execute dynamic work rebalancing 
> from some reason this could have significant performance drawbacks. For 
> example, scheduling times for load jobs will add up.
>  
> A better approach might be to start load jobs at process() method but wait 
> for all load jobs to finish at finishBundle() method. This will parallelize 
> any overheads as well as job execution (assuming more than one job is 
> schedule by BQ.).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #125

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 8.30 KB...]
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 2.391 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.038 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 3,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 3,5,main]) completed. 
Took 0.003 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 3,5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:check (Thread[Task worker for ':buildSrc' Th

Build failed in Jenkins: beam_PreCommit_Website_Cron #151

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 166.36 KB...]
  *  External link http://images/logos/runners/spark.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/go.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/java.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/python.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/scala.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/downloads/index.html
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/index.html
  *  External link http://get-started/beam-overview failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://get-started/mobile-gaming-example failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/mobile-gaming-example/index.html
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-basic.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-event-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-proc-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Somet

Build failed in Jenkins: beam_PreCommit_Website_Stage_GCS_Cron #13

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 6.80 KB...]
Skipping task ':buildSrc:classes' as it has no actions.
:buildSrc:classes (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
0.0 secs.
:buildSrc:jar (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:jar
Build cache key for task ':buildSrc:jar' is 7445e5c45b21f8a690f2f547fcb49594
Caching disabled for task ':buildSrc:jar': Caching has not been enabled for the 
task
Task ':buildSrc:jar' is not up-to-date because:
  No history is available.
:buildSrc:jar (Thread[Task worker for ':buildSrc',5,main]) completed. Took 0.11 
secs.
:buildSrc:assemble (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 1.199 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.028 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for 
':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for 
':buildSrc',5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc',5,main]) completed. 
Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.002 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.002 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.002 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:

Jenkins build is back to normal : beam_PostCommit_Python_VR_Flink #265

2018-10-07 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Py_VR_Dataflow #1282

2018-10-07 Thread Apache Jenkins Server
See 




[beam] branch master updated (b2ad757 -> 8fdbc09)

2018-10-07 Thread reuvenlax
This is an automated email from the ASF dual-hosted git repository.

reuvenlax pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from b2ad757  Merge pull request #6578 from Ardagan/b111881785
 add d7c94c8  Make load jobs and copy jobs non blocking in processElement. 
Wait for them to finish in finishBundle instead.
 new 8fdbc09  Merge pull request #6416: [BEAM-5105] Better parallelize 
BigQuery load jobs

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../beam/sdk/io/gcp/bigquery/BigQueryHelpers.java  | 217 +++--
 .../beam/sdk/io/gcp/bigquery/WriteRename.java  | 245 +--
 .../beam/sdk/io/gcp/bigquery/WriteTables.java  | 264 +++--
 .../sdk/io/gcp/bigquery/BigQueryHelpersTest.java   |  72 ++
 .../sdk/io/gcp/bigquery/BigQueryIOWriteTest.java   |   7 +-
 5 files changed, 542 insertions(+), 263 deletions(-)



[beam] 01/01: Merge pull request #6416: [BEAM-5105] Better parallelize BigQuery load jobs

2018-10-07 Thread reuvenlax
This is an automated email from the ASF dual-hosted git repository.

reuvenlax pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 8fdbc0973949123a8fa0a27aa0f9773b1c24c6db
Merge: b2ad757 d7c94c8
Author: reuvenlax 
AuthorDate: Sun Oct 7 12:41:03 2018 -0700

Merge pull request #6416: [BEAM-5105] Better parallelize BigQuery load jobs

 .../beam/sdk/io/gcp/bigquery/BigQueryHelpers.java  | 217 +++--
 .../beam/sdk/io/gcp/bigquery/WriteRename.java  | 245 +--
 .../beam/sdk/io/gcp/bigquery/WriteTables.java  | 264 +++--
 .../sdk/io/gcp/bigquery/BigQueryHelpersTest.java   |  72 ++
 .../sdk/io/gcp/bigquery/BigQueryIOWriteTest.java   |   7 +-
 5 files changed, 542 insertions(+), 263 deletions(-)



[jira] [Work logged] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5105?focusedWorklogId=152098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152098
 ]

ASF GitHub Bot logged work on BEAM-5105:


Author: ASF GitHub Bot
Created on: 07/Oct/18 19:41
Start Date: 07/Oct/18 19:41
Worklog Time Spent: 10m 
  Work Description: reuvenlax closed pull request #6416: [BEAM-5105] Better 
parallelize BigQuery load jobs
URL: https://github.com/apache/beam/pull/6416
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
index 83af04125df..7469a19a451 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java
@@ -20,15 +20,18 @@
 
 import static com.google.common.base.Preconditions.checkState;
 
+import com.google.api.client.util.BackOff;
+import com.google.api.client.util.BackOffUtils;
+import com.google.api.client.util.Sleeper;
 import com.google.api.services.bigquery.model.Dataset;
 import com.google.api.services.bigquery.model.Job;
-import com.google.api.services.bigquery.model.JobReference;
 import com.google.api.services.bigquery.model.JobStatus;
 import com.google.api.services.bigquery.model.TableReference;
 import com.google.api.services.bigquery.model.TableSchema;
 import com.google.api.services.bigquery.model.TimePartitioning;
 import com.google.cloud.hadoop.util.ApiErrorExtractor;
 import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.Lists;
 import com.google.common.hash.Hashing;
 import java.io.IOException;
 import java.util.ArrayList;
@@ -40,10 +43,12 @@
 import org.apache.beam.sdk.io.FileSystems;
 import org.apache.beam.sdk.io.fs.ResolveOptions;
 import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.DatasetService;
-import org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices.JobService;
 import org.apache.beam.sdk.options.ValueProvider;
 import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider;
 import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOffAdapter;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.joda.time.Duration;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -79,24 +84,210 @@ public RetryJobIdResult(RetryJobId jobId, boolean 
shouldRetry) {
 }
   }
 
+  // A class that waits for pending jobs, retrying them according to policy if 
they fail.
+  static class PendingJobManager {
+private static class JobInfo {
+  private final PendingJob pendingJob;
+  @Nullable private final SerializableFunction 
onSuccess;
+
+  public JobInfo(PendingJob pendingJob, SerializableFunction onSuccess) {
+this.pendingJob = pendingJob;
+this.onSuccess = onSuccess;
+  }
+}
+
+private List pendingJobs = Lists.newArrayList();
+private final BackOff backOff;
+
+PendingJobManager() {
+  this(
+  BackOffAdapter.toGcpBackOff(
+  FluentBackoff.DEFAULT
+  .withMaxRetries(Integer.MAX_VALUE)
+  .withInitialBackoff(Duration.standardSeconds(1))
+  .withMaxBackoff(Duration.standardMinutes(1))
+  .backoff()));
+}
+
+PendingJobManager(BackOff backOff) {
+  this.backOff = backOff;
+}
+
+// Add a pending job and a function to call when the job has completed 
successfully.
+PendingJobManager addPendingJob(
+PendingJob pendingJob, @Nullable SerializableFunction onSuccess) {
+  this.pendingJobs.add(new JobInfo(pendingJob, onSuccess));
+  return this;
+}
+
+void waitForDone() throws Exception {
+  LOG.info("Waiting for jobs to complete.");
+  Sleeper sleeper = Sleeper.DEFAULT;
+  while (!pendingJobs.isEmpty()) {
+List retryJobs = Lists.newArrayList();
+for (JobInfo jobInfo : pendingJobs) {
+  if (jobInfo.pendingJob.pollJob()) {
+// Job has completed successfully.
+LOG.info("Job {} completed successfully.", 
jobInfo.pendingJob.currentJobId);
+Exception e = jobInfo.onSuccess.apply(jobInfo.pendingJob);
+if (e != null) {
+  throw e;
+}
+  } else {
+// Job failed, schedule it again.
+LOG.info("Job {} failed. retrying.", 
jobInfo.pendingJob.currentJobId);
+retryJobs.a

Build failed in Jenkins: beam_PostCommit_Website_Publish #126

2018-10-07 Thread Apache Jenkins Server
See 


Changes:

[relax] Make load jobs and copy jobs non blocking in processElement. Wait for

--
[...truncated 8.22 KB...]
> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 1.472 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.025 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for 
':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for 
':buildSrc',5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc',5,main]) completed. 
Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc',5,main]) completed. 
Took 0.001 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
0.002 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:check (Thread[Task worker for ':buildSrc',5,main]) completed.

[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=152101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152101
 ]

ASF GitHub Bot logged work on BEAM-4461:


Author: ASF GitHub Bot
Created on: 07/Oct/18 20:34
Start Date: 07/Oct/18 20:34
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6298: 
[BEAM-4461] Introduce Group transform.
URL: https://github.com/apache/beam/pull/6298#discussion_r223226864
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaAggregateFn.java
 ##
 @@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.collect.Lists;
+import java.io.Serializable;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.coders.CannotProvideCoderException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.CoderRegistry;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldTypeDescriptors;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.CombineFns;
+import org.apache.beam.sdk.transforms.CombineFns.CoCombineResult;
+import org.apache.beam.sdk.transforms.CombineFns.ComposedCombineFn;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.SerializableFunctions;
+import org.apache.beam.sdk.transforms.SimpleFunction;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTag;
+
+@Experimental(Kind.SCHEMAS)
+class SchemaAggregateFn {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152101)
Time Spent: 12h 40m  (was: 12.5h)

> Create a library of useful transforms that use schemas
> --
>
> Key: BEAM-4461
> URL: https://issues.apache.org/jira/browse/BEAM-4461
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> e.g. JoinBy(fields). Project, Filter, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #127

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 7.67 KB...]
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 1.521 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.024 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 8,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 8,5,main]) completed. 
Took 0.004 secs.
:buildSrc:check (Thread[Task worker for ':buil

Build failed in Jenkins: beam_PreCommit_Website_Cron #152

2018-10-07 Thread Apache Jenkins Server
See 


Changes:

[relax] Make load jobs and copy jobs non blocking in processElement. Wait for

--
[...truncated 168.64 KB...]
  *  External link http://images/logos/runners/spark.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/go.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/java.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/python.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/scala.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/downloads/index.html
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/index.html
  *  External link http://get-started/beam-overview failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://get-started/mobile-gaming-example failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/mobile-gaming-example/index.html
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-basic.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-event-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-proc-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's

Build failed in Jenkins: beam_PreCommit_Website_Stage_GCS_Cron #14

2018-10-07 Thread Apache Jenkins Server
See 


Changes:

[relax] Make load jobs and copy jobs non blocking in processElement. Wait for

--
[...truncated 7.82 KB...]
Task ':buildSrc:jar' is not up-to-date because:
  No history is available.
:buildSrc:jar (Thread[Task worker for ':buildSrc' Thread 4,5,main]) completed. 
Took 0.109 secs.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 1.475 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.03 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Threa

[jira] [Commented] (BEAM-5442) PortableRunner swallows custom options for Runner

2018-10-07 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641279#comment-16641279
 ] 

Thomas Weise commented on BEAM-5442:


[~mxm] I checked and see that an unknown option such as 
--checkpointing_interval=3 is now passed through to the runner. BTW 
checkpointing doesn't seem to work for other reasons, but that's a separate 
issue.

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.8.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152138
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 08/Oct/18 05:37
Start Date: 08/Oct/18 05:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223250514
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -140,7 +140,7 @@ def get_version():
 # oauth2client >=4 only works with google-apitools>=0.5.18.
 'google-apitools>=0.5.18,<=0.5.20',
 'proto-google-cloud-datastore-v1>=0.90.0,<=0.90.4',
-'googledatastore==7.0.1',
+'googledatastore==7.0.1; python_version < "3.5.0"',
 
 Review comment:
   Let's change version qualifier to `python_version < "3.0"`, since AFAIK 
googledatastore will not work on any version of Python 3. We can also add a 
`#TODO(BEAM-...)` comment, in case anyone reading this line will wonder.
   Note that we used `python_version < "3.5.0"` on `typing` is a back-port 
module required only for Python interpreter less that 3.5. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152138)
Time Spent: 1h 40m  (was: 1.5h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152139
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 08/Oct/18 05:37
Start Date: 08/Oct/18 05:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223249900
 
 

 ##
 File path: sdks/python/apache_beam/io/source_test_utils_test.py
 ##
 @@ -115,6 +131,9 @@ def test_split_at_fraction_binary(self):
 self.assertTrue(stats.successful_fractions)
 self.assertTrue(stats.non_trivial_fractions)
 
+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
 
 Review comment:
   This PR has a lot of tests excluded due to `BEAM-5627`,  I assume that you 
have verified that these excluded tests have the symptom described by 
`BEAM-5627`. If they fail for a different reason, we should use a different 
Jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152139)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152140
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 08/Oct/18 05:37
Start Date: 08/Oct/18 05:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223249310
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsource_test.py
 ##
 @@ -51,6 +52,9 @@
 
 class LineSource(FileBasedSource):
 
+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
 
 Review comment:
   Let's introduce an environment variable `RUN_SKIPPED_PY3_TESTS`, and if it 
is set to 1, let's not skip the tests. In a separate PR we should also add a 
possibility to unskip Python 3 tests we have already skipped. This will make it 
easier to see how many tests we should still fix, and which tests are already 
passing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152140)
Time Spent: 1h 50m  (was: 1h 40m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152141
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 08/Oct/18 05:37
Start Date: 08/Oct/18 05:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223249468
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/datastore/v1/helper_test.py
 ##
 @@ -28,9 +28,12 @@
 from mock import MagicMock
 
 # pylint: disable=ungrouped-imports
-from apache_beam.io.gcp.datastore.v1 import fake_datastore
-from apache_beam.io.gcp.datastore.v1 import helper
-from apache_beam.testing.test_utils import patch_retry
+try:
 
 Review comment:
   In places where we skip Datastore imports, let's add a comment, something 
like:
   `# TODO(BEAM-4543): googledatastore dependency does not work on Python 3.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152141)
Time Spent: 2h  (was: 1h 50m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PreCommit_Website_Stage_GCS_Cron #15

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 6.89 KB...]
Skipping task ':buildSrc:classes' as it has no actions.
:buildSrc:classes (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:jar (Thread[Task worker for ':buildSrc' Thread 3,5,main]) started.

> Task :buildSrc:jar
Build cache key for task ':buildSrc:jar' is 7445e5c45b21f8a690f2f547fcb49594
Caching disabled for task ':buildSrc:jar': Caching has not been enabled for the 
task
Task ':buildSrc:jar' is not up-to-date because:
  No history is available.
:buildSrc:jar (Thread[Task worker for ':buildSrc' Thread 3,5,main]) completed. 
Took 0.103 secs.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.001 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 1.274 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.028 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.003 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.002 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.001

Build failed in Jenkins: beam_PreCommit_Website_Cron #153

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 166.70 KB...]
  *  External link http://images/logos/runners/spark.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/go.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/java.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/python.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/scala.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/downloads/index.html
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/index.html
  *  External link http://get-started/beam-overview failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://get-started/mobile-gaming-example failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/mobile-gaming-example/index.html
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-basic.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-event-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-proc-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Somet

Build failed in Jenkins: beam_PostCommit_Website_Publish #128

2018-10-07 Thread Apache Jenkins Server
See 


--
[...truncated 7.75 KB...]
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 1.398 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.022 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 8,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 8,5,main]) completed. 
Took 0.002 secs.
:buildSrc:check (Thread[Task worker for ':buil

[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas

2018-10-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=152142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152142
 ]

ASF GitHub Bot logged work on BEAM-4461:


Author: ASF GitHub Bot
Created on: 08/Oct/18 06:06
Start Date: 08/Oct/18 06:06
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #6298: 
[BEAM-4461] Introduce Group transform.
URL: https://github.com/apache/beam/pull/6298#discussion_r223254160
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java
 ##
 @@ -0,0 +1,684 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.schemas.transforms;
+
+import java.util.List;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.transforms.Combine;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.Values;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A generic grouping transform for schema {@link PCollection}s.
+ *
+ * When used without a combiner, this transforms simply acts as a {@link 
GroupByKey} but without
+ * the need for the user to explicitly extract the keys. For example, consider 
the following input
+ * type:
+ *
+ * {@code
+ * {@literal @DefaultSchema(JavaFieldSchema.class)}
+ * public class UserPurchase {
+ *   public String userId;
+ *   public String country;
+ *   public long cost;
+ *   public double transactionDuration;
+ * }
+ *
+ * {@literal PCollection} purchases = readUserPurchases();
+ * }
+ *
+ * You can group all purchases by user and country as follows:
+ *
+ * {@code
+ * {@literal @DefaultSchema(JavaFieldSchema.class)}
+ * {@literal PCollection>} byUser =
+ *   purchases.apply(Group.byFieldNames("userId', "country"));
+ * }
+ *
+ * However often an aggregation of some form is desired. The builder 
methods inside the Group
+ * class allows building up separate aggregations for every field (or set of 
fields) on the input
+ * schema, and generating an output schema based on these aggregations. For 
example:
+ *
+ * {@code
+ * {@literal PCollection>} aggregated = purchases
+ *  .apply(Group.byFieldNames("userId', "country")
+ *  .aggregateField("cost", Sum.ofLongs(), "total_cost")
 
 Review comment:
   This is not a DataFrames API here. I think we should have one, but it might 
make more sense as a layer on top of Calcite.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152142)
Time Spent: 12h 50m  (was: 12h 40m)

> Create a library of useful transforms that use schemas
> --
>
> Key: BEAM-4461
> URL: https://issues.apache.org/jira/browse/BEAM-4461
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> e.g. JoinBy(fields). Project, Filter, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#