date:20190610

[jira] [Commented] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.

2019-06-10 Thread Charith Ellawala (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860600#comment-16860600
 ] 

Charith Ellawala commented on BEAM-7526:


I'll take a look

> beam_PostCommit_SQL suite is failing since June 7.
> --
>
> Key: BEAM-7526
> URL: https://issues.apache.org/jira/browse/BEAM-7526
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Valentyn Tymofieiev
>Assignee: Reuven Lax
>Priority: Major
>
> Error Message
> java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
> com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not 
> supported
> Stacktrace
> java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
> com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not 
> supported
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-10 Thread Valentyn Tymofieiev (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7527:
--
Description: 
I am seeing several errors in Python SDK Integration test suites, such as 
Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
autogenerated files not being found.

For example:

{noformat}
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84:
 UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully 
supported. You may encounter buggy behavior or missing features.
  'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... 
ERROR
==
ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2')
--
Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py",
 line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py",
 line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
 line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
 line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
 line 245, in load_module
return load_package(name, filename)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
 line 217, in load_package
return _load(spec)
  File "", line 684, in _load
  File "", line 665, in _load_unlocked
  File "", line 678, in exec_module
  File "", line 219, in _call_with_frames_removed
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py",
 line 97, in 
from apache_beam import coders
  File "/home/jenkins/jenkins-slave/workspace/beam_Pos
tCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/__init__.py", line 
19, in 
from apache_beam.coders.coders import *
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coders.py",
 line 32, in 
from apache_beam.coders import coder_impl
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coder_impl.py",
 line 44, in 
from apache_beam.utils import windowed_value
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/windowed_value.py",
 line 34, in 
from apache_beam.utils.timestamp import MAX_TIMESTAMP
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/timestamp.py",
 line 34, in 
from apache_beam.portability import common_urns
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/common_urns.py",
 line 25, in 
from apache_beam.portability.api import metrics_pb2
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/api/metrics_pb2.py",
 line 16, in 
import beam_runner_api_pb2 as beam__runner__api__pb2
ModuleNotFoundError: No module named 'beam_runner_api_pb2'
{noformat}


{noformat}
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84:
 UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully 
supported. You may encounter buggy behavior or missing features.
  'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
Failure: ModuleNotFoundError (No module named 'endpoints_pb2') ... 
ERROR
==
ERROR: Failure: ModuleNotFoundError (No module named 'endpoints_pb2')
--
Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py",
 line 39, in runTest
ra

[jira] [Commented] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-10 Thread Valentyn Tymofieiev (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860467#comment-16860467
 ] 

Valentyn Tymofieiev commented on BEAM-7527:
---

Same error in Python SDK WordCount performance test  suite: 
https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/103/console

> Beam Python integration test suites are flaky: ModuleNotFoundError
> --
>
> Key: BEAM-7527
> URL: https://issues.apache.org/jira/browse/BEAM-7527
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> I am seeing several errors in Python SDK Integration test suites, such as 
> Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
> autogenerated files not being found.
> For example:
> {noformat}
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84:
>  UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully 
> supported. You may encounter buggy behavior or missing features.
>   'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
> Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... 
> ERROR
> ==
> ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2')
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
>  line 245, in load_module
> return load_package(name, filename)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
>  line 217, in load_package
> return _load(spec)
>   File "", line 684, in _load
>   File "", line 665, in _load_unlocked
>   File "", line 678, in exec_module
>   File "", line 219, in _call_with_frames_removed
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py",
>  line 97, in 
> from apache_beam import coders
>   File "/home/jenkins/jenkins-slave/workspace/beam_Pos
> tCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/__init__.py", line 
> 19, in 
> from apache_beam.coders.coders import *
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coders.py",
>  line 32, in 
> from apache_beam.coders import coder_impl
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coder_impl.py",
>  line 44, in 
> from apache_beam.utils import windowed_value
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/windowed_value.py",
>  line 34, in 
> from apache_beam.utils.timestamp import MAX_TIMESTAMP
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/timestamp.py",
>  line 34, in 
> from apache_beam.portability import common_urns
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/common_urns.py",
>  line 25, in 
> from apache_beam.portability.api import metrics_pb2
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/api/metrics_pb2.py",
>  line 16, in 
> import beam_runner_api_pb2 as beam__runner__api__pb2
> ModuleNotFoundError: No module named 'beam_runner_api_pb2'
> {noformat}
> {noformat}
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/py

[jira] [Commented] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Kishan Kumar (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860466#comment-16860466
 ] 

Kishan Kumar commented on BEAM-7425:


[~kedin] Thanks for Clarification but Like for all Other Datatype I can Easily 
Read the Value as a String from SchemaRecord Class While Parsing after Reading 
from BQ but for Numeric Data Type it's Failing and Giving Exception or Giving 
Value as *last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]*. So 
instead of just Asking this Fix, I asked a New Feature where I only need to 
Pass POJO Class And My Values automatically get Parsed and Set. All Such 
Parsing and Other Work will get Rid off. And i don't want to Use TableRow 
because of Following 
Issue:https://stackoverflow.com/questions/48184043/bigqueryio-read-performance-very-slow-apache-beam?rq=1


> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-avro, io-java-gcp
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError

2019-06-10 Thread Valentyn Tymofieiev (JIRA)

Valentyn Tymofieiev created BEAM-7527:
-

 Summary: Beam Python integration test suites are flaky: 
ModuleNotFoundError
 Key: BEAM-7527
 URL: https://issues.apache.org/jira/browse/BEAM-7527
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Valentyn Tymofieiev


I am seeing several errors in Python SDK Integration test suites, such as 
Dataflow ValidatesRunner and Python PostCommit that fail due to one of the 
autogenerated files not being found.

For example:

{noformat}
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84:
 UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully 
supported. You may encounter buggy behavior or missing features.
  'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... 
ERROR
==
ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2')
--
Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py",
 line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py",
 line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
 line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py",
 line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
 line 245, in load_module
return load_package(name, filename)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py",
 line 217, in load_package
return _load(spec)
  File "", line 684, in _load
  File "", line 665, in _load_unlocked
  File "", line 678, in exec_module
  File "", line 219, in _call_with_frames_removed
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py",
 line 97, in 
from apache_beam import coders
  File "/home/jenkins/jenkins-slave/workspace/beam_Pos
tCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/__init__.py", line 
19, in 
from apache_beam.coders.coders import *
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coders.py",
 line 32, in 
from apache_beam.coders import coder_impl
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coder_impl.py",
 line 44, in 
from apache_beam.utils import windowed_value
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/windowed_value.py",
 line 34, in 
from apache_beam.utils.timestamp import MAX_TIMESTAMP
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/timestamp.py",
 line 34, in 
from apache_beam.portability import common_urns
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/common_urns.py",
 line 25, in 
from apache_beam.portability.api import metrics_pb2
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/api/metrics_pb2.py",
 line 16, in 
import beam_runner_api_pb2 as beam__runner__api__pb2
ModuleNotFoundError: No module named 'beam_runner_api_pb2'
{noformat}


{noformat}
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84:
 UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully 
supported. You may encounter buggy behavior or missing features.
  'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
Failure: ModuleNotFoundError (No module named 'endpoints_pb2') ... 
ERROR
==
ERROR: Failure: ModuleNotFoundError (No module named 'endpoints_pb2')
--
Traceback (most recent call last):

[jira] [Work logged] (BEAM-6673) BigQueryIO.Read should automatically produce schemas

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6673?focusedWorklogId=257313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257313
 ]

ASF GitHub Bot logged work on BEAM-6673:


Author: ASF GitHub Bot
Created on: 11/Jun/19 01:38
Start Date: 11/Jun/19 01:38
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8620: [BEAM-6673] Add 
schema support to BigQuery reads
URL: https://github.com/apache/beam/pull/8620#issuecomment-500652691
 
 
   This PR breaks Beam SQL postcommit suite, see: 
https://issues.apache.org/jira/browse/BEAM-7526.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257313)
Time Spent: 2h 20m  (was: 2h 10m)

> BigQueryIO.Read should automatically produce schemas
> 
>
> Key: BEAM-6673
> URL: https://issues.apache.org/jira/browse/BEAM-6673
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The output PCollections should contain 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.

2019-06-10 Thread Valentyn Tymofieiev (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-7526:
-

Assignee: Reuven Lax

> beam_PostCommit_SQL suite is failing since June 7.
> --
>
> Key: BEAM-7526
> URL: https://issues.apache.org/jira/browse/BEAM-7526
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Valentyn Tymofieiev
>Assignee: Reuven Lax
>Priority: Major
>
> Error Message
> java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
> com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not 
> supported
> Stacktrace
> java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
> com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not 
> supported
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.

2019-06-10 Thread Valentyn Tymofieiev (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860461#comment-16860461
 ] 

Valentyn Tymofieiev commented on BEAM-7526:
---

https://github.com/apache/beam/pull/8620 seems to be the rootcause as this is 
the only relevant commit visualized on the first failing run: 
https://builds.apache.org/job/beam_PostCommit_SQL/1746/

I cannot find Jira username for @charithe, assigning to  [~reuvenlax].
cc: [~shehzaadn] [~kedin]


> beam_PostCommit_SQL suite is failing since June 7.
> --
>
> Key: BEAM-7526
> URL: https://issues.apache.org/jira/browse/BEAM-7526
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Error Message
> java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
> com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not 
> supported
> Stacktrace
> java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
> com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not 
> supported
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=257311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257311
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 11/Jun/19 01:37
Start Date: 11/Jun/19 01:37
Worklog Time Spent: 10m 
  Work Description: jkff commented on issue #8741: [BEAM-7428] Output the 
timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-500652579
 
 
   I think we discussed this before, and concluded that 
reader.getCurrentTimestamp() ought to return a timestamp greater than timestamp 
of the element being read, which is consistent with watermark logic everywhere 
in Beam (including SDF): you can output forward in time but not backward.
   
   Most BoundedSource's indeed output at -INF, which is inconsistent with the 
above. On the other hand, you could argue that they do so because they assume 
their source itself has a timestamp of -INF, and what they really mean is 
"output at the timestamp of my source"?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257311)
Time Spent: 4h  (was: 3h 50m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6947) TestGCSIO.test_last_updated (gcsio_test.py) fails when the current timezone offset and the timezone offset on 2nd of Jan 1970 differ

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6947?focusedWorklogId=257312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257312
 ]

ASF GitHub Bot logged work on BEAM-6947:


Author: ASF GitHub Bot
Created on: 11/Jun/19 01:37
Start Date: 11/Jun/19 01:37
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8180: [BEAM-6947] Fix 
for failing TestGCSIO.test_last_updated (gcsio_test.py) test
URL: https://github.com/apache/beam/pull/8180#issuecomment-500652616
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257312)
Time Spent: 40m  (was: 0.5h)

> TestGCSIO.test_last_updated (gcsio_test.py) fails when the current timezone 
> offset and the timezone offset on 2nd of Jan 1970 differ
> 
>
> Key: BEAM-6947
> URL: https://issues.apache.org/jira/browse/BEAM-6947
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Csaba Kassai
>Assignee: Csaba Kassai
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test TestGCSIO.test_last_updated uses timestamp 123456.78 as the last 
> updated timestamp.  This timestamp is converted into a naive datetime in the 
> Fakefile class get_metadata method (gcsio_test.py line 72) Then in the GcsIO 
> class last_updated method (gcsio.py ) is converted back to timestamp. But the 
> conversion is incorrect when the the timezone offset is different in 1970 and 
> now. In my case now Singapore is GMT+8 and it was only GMT+7:30 in 1970. So 
> the test fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.

2019-06-10 Thread Valentyn Tymofieiev (JIRA)

Valentyn Tymofieiev created BEAM-7526:
-

 Summary: beam_PostCommit_SQL suite is failing since June 7.
 Key: BEAM-7526
 URL: https://issues.apache.org/jira/browse/BEAM-7526
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Valentyn Tymofieiev


Error Message
java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
nullable=false, logicalType=null, collectionElementType=null, mapKeyType=null, 
mapValueType=null, rowSchema=null, metadata={}}' is not supported
Stacktrace
java.lang.UnsupportedOperationException: Converting BigQuery type 'class 
com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, 
nullable=false, logicalType=null, collectionElementType=null, mapKeyType=null, 
mapValueType=null, rowSchema=null, metadata={}}' is not supported
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-6952) concatenated compressed files bug with python sdk

2019-06-10 Thread Chamikara Jayalath (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath updated BEAM-6952:
-
Affects Version/s: (was: 2.13.0)
   Not applicable

> concatenated compressed files bug with python sdk
> -
>
> Key: BEAM-6952
> URL: https://issues.apache.org/jira/browse/BEAM-6952
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Daniel Lescohier
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The Python apache_beam.io.filesystem module has a bug handling concatenated 
> compressed files.
> The PR I will create has two commits:
>  # a new unit test that shows the problem 
>  # a fix to the problem.
> The unit test is added to the apache_beam.io.filesystem_test module. It was 
> added to this module because the test: 
> apache_beam.io.textio_test.test_read_gzip_concat does not encounter the 
> problem in the Beam 2.11 and earlier code base because the test data is too 
> small: the data is smaller than read_size, so it goes through logic in the 
> code that avoids the problem in the code. So, this test sets read_size 
> smaller and test data bigger, in order to encounter the problem. It would be 
> difficult to test in the textio_test module, because you'd need very large 
> test data because default read_size is 16MiB, and the ReadFromText interface 
> does not allow you to modify the read_size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-6952) concatenated compressed files bug with python sdk

2019-06-10 Thread Chamikara Jayalath (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-6952.
--
Resolution: Fixed

> concatenated compressed files bug with python sdk
> -
>
> Key: BEAM-6952
> URL: https://issues.apache.org/jira/browse/BEAM-6952
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The Python apache_beam.io.filesystem module has a bug handling concatenated 
> compressed files.
> The PR I will create has two commits:
>  # a new unit test that shows the problem 
>  # a fix to the problem.
> The unit test is added to the apache_beam.io.filesystem_test module. It was 
> added to this module because the test: 
> apache_beam.io.textio_test.test_read_gzip_concat does not encounter the 
> problem in the Beam 2.11 and earlier code base because the test data is too 
> small: the data is smaller than read_size, so it goes through logic in the 
> code that avoids the problem in the code. So, this test sets read_size 
> smaller and test data bigger, in order to encounter the problem. It would be 
> difficult to test in the textio_test module, because you'd need very large 
> test data because default read_size is 16MiB, and the ReadFromText interface 
> does not allow you to modify the read_size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-6952) concatenated compressed files bug with python sdk

2019-06-10 Thread Chamikara Jayalath (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath updated BEAM-6952:
-
Affects Version/s: (was: 2.11.0)
   2.13.0

> concatenated compressed files bug with python sdk
> -
>
> Key: BEAM-6952
> URL: https://issues.apache.org/jira/browse/BEAM-6952
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.13.0
>Reporter: Daniel Lescohier
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The Python apache_beam.io.filesystem module has a bug handling concatenated 
> compressed files.
> The PR I will create has two commits:
>  # a new unit test that shows the problem 
>  # a fix to the problem.
> The unit test is added to the apache_beam.io.filesystem_test module. It was 
> added to this module because the test: 
> apache_beam.io.textio_test.test_read_gzip_concat does not encounter the 
> problem in the Beam 2.11 and earlier code base because the test data is too 
> small: the data is smaller than read_size, so it goes through logic in the 
> code that avoids the problem in the code. So, this test sets read_size 
> smaller and test data bigger, in order to encounter the problem. It would be 
> difficult to test in the textio_test module, because you'd need very large 
> test data because default read_size is 16MiB, and the ReadFromText interface 
> does not allow you to modify the read_size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6952) concatenated compressed files bug with python sdk

2019-06-10 Thread Chamikara Jayalath (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860455#comment-16860455
 ] 

Chamikara Jayalath commented on BEAM-6952:
--

Yeah. closing.

> concatenated compressed files bug with python sdk
> -
>
> Key: BEAM-6952
> URL: https://issues.apache.org/jira/browse/BEAM-6952
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The Python apache_beam.io.filesystem module has a bug handling concatenated 
> compressed files.
> The PR I will create has two commits:
>  # a new unit test that shows the problem 
>  # a fix to the problem.
> The unit test is added to the apache_beam.io.filesystem_test module. It was 
> added to this module because the test: 
> apache_beam.io.textio_test.test_read_gzip_concat does not encounter the 
> problem in the Beam 2.11 and earlier code base because the test data is too 
> small: the data is smaller than read_size, so it goes through logic in the 
> code that avoids the problem in the code. So, this test sets read_size 
> smaller and test data bigger, in order to encounter the problem. It would be 
> difficult to test in the textio_test module, because you'd need very large 
> test data because default read_size is 16MiB, and the ReadFromText interface 
> does not allow you to modify the read_size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7399) Publishing a blog post about looping timers.

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7399?focusedWorklogId=257308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257308
 ]

ASF GitHub Bot logged work on BEAM-7399:


Author: ASF GitHub Bot
Created on: 11/Jun/19 00:52
Start Date: 11/Jun/19 00:52
Worklog Time Spent: 10m 
  Work Description: rezarokni commented on issue #8686: [BEAM-7399] Blog 
post on looping timers
URL: https://github.com/apache/beam/pull/8686#issuecomment-500644848
 
 
   @kennknowles fixed conflict and changed dates. Also when I make use of 
gradle to see the staged web, although this blog does not show up, I noticed 
the last blog in master also does not show up. Be interesting to see if on 
merge this magically appears
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257308)
Time Spent: 50m  (was: 40m)

> Publishing a blog post about looping timers. 
> -
>
> Key: BEAM-7399
> URL: https://issues.apache.org/jira/browse/BEAM-7399
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: Trivial
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7043) Add DynamoDBIO

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=257299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257299
 ]

ASF GitHub Bot logged work on BEAM-7043:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:58
Start Date: 10/Jun/19 23:58
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on issue #8390:  [BEAM-7043] Add 
DynamoDBIO
URL: https://github.com/apache/beam/pull/8390#issuecomment-500635479
 
 
   @iemejia , I put together my implementation for the Reader with ScanResult 
(as return type) in the last commit, do you mind to take a look? However the 
test case of the Reader is failing, so I would love to have your instructions 
to get it work. And regarding the Writer with PCollection as return type 
as you requested, I still couldn't get it work (the reasons I explained above). 
Can you give me some instructions as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257299)
Time Spent: 15.5h  (was: 15h 20m)

> Add DynamoDBIO
> --
>
> Key: BEAM-7043
> URL: https://issues.apache.org/jira/browse/BEAM-7043
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-aws
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Currently we don't have any feature to write data to AWS DynamoDB. This 
> feature will enable us to send data to DynamoDB



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257298
 ]

ASF GitHub Bot logged work on BEAM-7044:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:57
Start Date: 10/Jun/19 23:57
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8812:  [BEAM-7044] portable 
Spark: support stateful dofns 
URL: https://github.com/apache/beam/pull/8812#issuecomment-500635374
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257298)
Time Spent: 50m  (was: 40m)

> Spark portable runner: support user state
> -
>
> Key: BEAM-7044
> URL: https://issues.apache.org/jira/browse/BEAM-7044
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-10 Thread Chamikara Jayalath (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860421#comment-16860421
 ] 

Chamikara Jayalath commented on BEAM-7424:
--

There are few things that are planned to do.

 

(1) Add retry logic for Java and Python SDKs.

(2) For Dataflow runner, plumb throttled time to Dataflow backend to consider 
when making autoscaling decisions.

 

This bug is for (1), It's great if we can get (2) into 2.14 as well but I'm not 
sure if timelines will match.

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-3788) Implement a Kafka IO for Python SDK

2019-06-10 Thread Chamikara Jayalath (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath reassigned BEAM-3788:


Assignee: (was: Chamikara Jayalath)

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Priority: Major
>
> This will be implemented using the Splittable DoFn framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK

2019-06-10 Thread Chamikara Jayalath (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860419#comment-16860419
 ] 

Chamikara Jayalath commented on BEAM-3788:
--

This is blocked till we have a streaming source framework for Python SDK for 
portable runners. To this end, Splittable DoFn is currently under development.

Note that, for folks using Flink runner, native Kafka source of Flink is 
currently available to Python SDK users through the cross-language transform 
API.

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Major
>
> This will be implemented using the Splittable DoFn framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=257293&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257293
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:37
Start Date: 10/Jun/19 23:37
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8135: [BEAM-6892] Adding 
support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#issuecomment-500631749
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257293)
Time Spent: 7h 10m  (was: 7h)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257289
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:28
Start Date: 10/Jun/19 23:28
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8811: [BEAM-7389] Add 
element-wise transform common code
URL: https://github.com/apache/beam/pull/8811#discussion_r292228951
 
 

 ##
 File path: sdks/python/apache_beam/examples/snippets/util.py
 ##
 @@ -0,0 +1,35 @@
+import argparse
 
 Review comment:
   maybe add a util_test.py for the methods here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257289)
Time Spent: 1.5h  (was: 1h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257280
 ]

ASF GitHub Bot logged work on BEAM-7044:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:16
Start Date: 10/Jun/19 23:16
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8812:  [BEAM-7044] portable 
Spark: support stateful dofns 
URL: https://github.com/apache/beam/pull/8812#issuecomment-500627536
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257280)
Time Spent: 0.5h  (was: 20m)

> Spark portable runner: support user state
> -
>
> Key: BEAM-7044
> URL: https://issues.apache.org/jira/browse/BEAM-7044
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257281
 ]

ASF GitHub Bot logged work on BEAM-7044:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:16
Start Date: 10/Jun/19 23:16
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8812:  [BEAM-7044] portable 
Spark: support stateful dofns 
URL: https://github.com/apache/beam/pull/8812#issuecomment-500627601
 
 
   Run Python Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257281)
Time Spent: 40m  (was: 0.5h)

> Spark portable runner: support user state
> -
>
> Key: BEAM-7044
> URL: https://issues.apache.org/jira/browse/BEAM-7044
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257276
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 23:03
Start Date: 10/Jun/19 23:03
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500625003
 
 
   This will add the common files infrastructure, I'll create (and link) other 
PRs for the rest of the transforms
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257276)
Time Spent: 1h 20m  (was: 1h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860399#comment-16860399
 ] 

Anton Kedin commented on BEAM-7424:
---

[~chamikara] [~heejong] what needs to be done here? Is the request to add retry 
logic to python SDK? It's unclear from the description.  

*(trying to understand the scope of changes for 2.14)

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread Alireza Samadianzakaria (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alireza Samadianzakaria resolved BEAM-7511.
---
   Resolution: Fixed
Fix Version/s: 2.14.0

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
>  at 
> org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76)
>  
> This happens because in 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not 
> initialized anywhere and the method updateConsumerProperties is never called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257259
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:35
Start Date: 10/Jun/19 22:35
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500619425
 
 
   Alright, I'll try to split it into somewhat related transforms
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257259)
Time Spent: 1h 10m  (was: 1h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257258
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:34
Start Date: 10/Jun/19 22:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500619228
 
 
   @davidcavazos can you split this into multiple changes with ideally ~500 
line PRs (or not more than 1000). It is tricky to find reviewers for very large 
changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257258)
Time Spent: 1h  (was: 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257257&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257257
 ]

ASF GitHub Bot logged work on BEAM-7044:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:33
Start Date: 10/Jun/19 22:33
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8812:  [BEAM-7044] portable 
Spark: support stateful dofns 
URL: https://github.com/apache/beam/pull/8812#issuecomment-500618942
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257257)
Time Spent: 20m  (was: 10m)

> Spark portable runner: support user state
> -
>
> Key: BEAM-7044
> URL: https://issues.apache.org/jira/browse/BEAM-7044
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-06-10 Thread Pablo Estrada (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-6892:

Fix Version/s: (was: 2.14.0)
   2.15.0

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7425:
--
Component/s: (was: beam-model)
 io-java-gcp
 io-java-avro

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-avro, io-java-gcp
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257254
 ]

ASF GitHub Bot logged work on BEAM-7044:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:30
Start Date: 10/Jun/19 22:30
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8812:  [BEAM-7044] 
portable Spark: support stateful dofns 
URL: https://github.com/apache/beam/pull/8812
 
 
   Depends on #8802.
   
   We need a new custom group by key function so that we can run the executable 
stage function separately for each key, while keeping the original key/value 
pairs.
   
   Something I'm not too proud of here is re-using helper functions within 
typing constraints set by guava `Iterables.transform`. I could've just as 
easily copied these functions into lambdas or something, so let me know what 
you think.
   
   R: @iemejia @angoenka @mxm
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/

[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7425:
--
Flags:   (was: Patch,Important)

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7425:
--
Status: Open  (was: Triage Needed)

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>  Labels: newbie, patch
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7425:
--
Labels:   (was: newbie patch)

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7425:
--
Fix Version/s: (was: 2.14.0)

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>  Labels: newbie, patch
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860390#comment-16860390
 ] 

Anton Kedin commented on BEAM-7425:
---

It's not a bug or a feature, it's a user question about how to convert the 
output of `BigQueryIO.read()` to POJOs. I don't think there can be implemented 
any concrete solution for this for any specific release, so removing the fix 
version tag as it shows up during release process and is not actionable by the 
release owner. I suggest asking on 
[u...@beam.apache.org|mailto:u...@beam.apache.org] or 
[d...@beam.apache.org|mailto:d...@beam.apache.org] if the issue is not resolved.

Please take a look at examples of how to use the BigQueryIO.read(): 
[https://beam.apache.org/releases/javadoc/2.13.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html]

You can use `readTableRows()` instead and get the parsed values out.

Take a look at snippets here: 
[https://github.com/apache/beam/blob/77cf84c634381495d45a112a9d147ad69394c0d4/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java#L168]

Or follow the TableRowParser implementation for an example of how such parser 
would work: 
[https://github.com/apache/beam/blob/79d478a83be221461add1501e218b9a4308f9ec8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L449]

 

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> 
>
> Key: BEAM-7425
> URL: https://issues.apache.org/jira/browse/BEAM-7425
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.12.0
> Environment: Dataflow
>Reporter: Kishan Kumar
>Priority: Major
>  Labels: newbie, patch
> Fix For: 2.14.0
>
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection gpseEftReasonCodes = input
>   .apply("Reading xxyyzz", 
>   BigQueryIO
>   .read(new ReadTable(ReasonCode.class))
>   .withoutValidation()
>   .withTemplateCompatibility()
>   .fromQuery("Select * from dataset.xxyyzz")
>   .usingStandardSql()
>   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable implements SerializableFunction 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class class1;
>   
>   public ReadTable(Class class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
> Map mapping = new HashMap<>();
> int counter = 0;
> try {
>   GenericRecord s = schemaAndRecord.getRecord();
>   org.apache.avro.Schema s1 = s.getSchema();
>   for (Field f : s1.getFields()) {
> counter++;
> mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   }
>   countingRecords.inc();
>   JsonElement jsonElement = gson.toJsonTree(mapping);
>   return gson.fromJson(jsonElement, class1);
> } catch (Exception mp) {
>   LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
> }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
> BigDecimal numericValue = new Conversions.DecimalConversion()
>.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQue

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257249
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:22
Start Date: 10/Jun/19 22:22
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500616314
 
 
   R: @aaltay 
   R: @chamikaramj 
   R: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257249)
Time Spent: 40m  (was: 0.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257251
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:22
Start Date: 10/Jun/19 22:22
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500616314
 
 
   R: @aaltay 
   R: @chamikaramj 
   R: @pabloem 
   
   Please review whenever you have a chance. It's a long list of changes, but 
it's mostly very simple code, so it shouldn't take too long.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257251)
Time Spent: 50m  (was: 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257248
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:21
Start Date: 10/Jun/19 22:21
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500616084
 
 
   R: @aaltay @chamikaramj @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257248)
Time Spent: 0.5h  (was: 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257247
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:21
Start Date: 10/Jun/19 22:21
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add 
element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811#issuecomment-500616084
 
 
   R: @aaltay @chamikaramj @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257247)
Time Spent: 20m  (was: 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257246
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 10/Jun/19 22:20
Start Date: 10/Jun/19 22:20
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #8811: 
[BEAM-7389] Add element-wise transform code samples
URL: https://github.com/apache/beam/pull/8811
 
 
   This adds Python code snippets for the element-wise transforms.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastComplet

[jira] [Assigned] (BEAM-7525) support hash_fn for Python ApproximateUniqueCombineFn transform

2019-06-10 Thread Hannah Jiang (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang reassigned BEAM-7525:
--

Assignee: Hannah Jiang

> support hash_fn for Python ApproximateUniqueCombineFn transform
> ---
>
> Key: BEAM-7525
> URL: https://issues.apache.org/jira/browse/BEAM-7525
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Minor
>
> ApproximateUniqueCombineFn is using default hash function for estimation.
> We can pass a hash_fn and overwrite default hash function to support better 
> estimation performance.
> Parent issue: BEAM-6693
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-06-10 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-5519:
---
Fix Version/s: (was: 2.14.0)
   2.15.0

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.15.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk

2019-06-10 Thread JIRA



[ 
https://issues.apache.org/jira/browse/BEAM-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860383#comment-16860383
 ] 

Ismaël Mejía commented on BEAM-5775:


No, we still need to evaluate the performance consequences of the change, 
moving to next release.

> Make the spark runner not serialize data unless spark is spilling to disk
> -
>
> Key: BEAM-5775
> URL: https://issues.apache.org/jira/browse/BEAM-5775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. 
> This lets Spark keep the data in memory avoiding the serialization round 
> trip. Unfortunately the logic is fairly coarse - as soon as you switch to 
> MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen 
> to keep the data in memory, incurring the serialization overhead.
>  
> Ideally Beam would serialize the data lazily - as Spark chooses to spill to 
> disk. This would be a change in behavior when using beam, but luckily Spark 
> has a solution for folks that want data serialized in memory - 
> MEMORY_AND_DISK_SER will keep the data serialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk

2019-06-10 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/BEAM-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-5775:
---
Fix Version/s: (was: 2.14.0)
   2.15.0

> Make the spark runner not serialize data unless spark is spilling to disk
> -
>
> Key: BEAM-5775
> URL: https://issues.apache.org/jira/browse/BEAM-5775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Minor
> Fix For: 2.15.0
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. 
> This lets Spark keep the data in memory avoiding the serialization round 
> trip. Unfortunately the logic is fairly coarse - as soon as you switch to 
> MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen 
> to keep the data in memory, incurring the serialization overhead.
>  
> Ideally Beam would serialize the data lazily - as Spark chooses to spill to 
> disk. This would be a change in behavior when using beam, but luckily Spark 
> has a solution for folks that want data serialized in memory - 
> MEMORY_AND_DISK_SER will keep the data serialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860355#comment-16860355
 ] 

Anton Kedin commented on BEAM-5519:
---

It doesn't look like this should be treated as a blocker for 2.14 as well? If 
so the I suggest removing the concrete fix version

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.14.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257222
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 10/Jun/19 21:38
Start Date: 10/Jun/19 21:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8799: [BEAM-6693] 
replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257222)
Time Spent: 12.5h  (was: 12h 20m)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-6928) Make Python SDK custom Sink the default Sink for BigQuery

2019-06-10 Thread Pablo Estrada (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-6928:

Fix Version/s: (was: 2.14.0)
   2.15.0

> Make Python SDK custom Sink the default Sink for BigQuery
> -
>
> Key: BEAM-6928
> URL: https://issues.apache.org/jira/browse/BEAM-6928
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.15.0
>
>
> This is for 2.14.0 - please bump version to 2.14.0 when doing 2.13.0 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860346#comment-16860346
 ] 

Anton Kedin commented on BEAM-5775:
---

Is this on track for 2.14 release (branch cut next week)?

> Make the spark runner not serialize data unless spark is spilling to disk
> -
>
> Key: BEAM-5775
> URL: https://issues.apache.org/jira/browse/BEAM-5775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. 
> This lets Spark keep the data in memory avoiding the serialization round 
> trip. Unfortunately the logic is fairly coarse - as soon as you switch to 
> MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen 
> to keep the data in memory, incurring the serialization overhead.
>  
> Ideally Beam would serialize the data lazily - as Spark chooses to spill to 
> disk. This would be a change in behavior when using beam, but luckily Spark 
> has a solution for folks that want data serialized in memory - 
> MEMORY_AND_DISK_SER will keep the data serialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860345#comment-16860345
 ] 

Anton Kedin commented on BEAM-6892:
---

One of the linked PRs hasn't been touched in a while. Does it make sense to 
keep targeting 2.14 (branch will be cut next week)?

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=257215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257215
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 10/Jun/19 21:25
Start Date: 10/Jun/19 21:25
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257215)
Time Spent: 2h  (was: 1h 50m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
>  at 
> org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76)
>  
> This happens because in 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not 
> initialized anywhere and the method updateConsumerProperties is never called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6952) concatenated compressed files bug with python sdk

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860341#comment-16860341
 ] 

Anton Kedin commented on BEAM-6952:
---

This looks fixed, both PRs are merged now. Safe to close?

> concatenated compressed files bug with python sdk
> -
>
> Key: BEAM-6952
> URL: https://issues.apache.org/jira/browse/BEAM-6952
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The Python apache_beam.io.filesystem module has a bug handling concatenated 
> compressed files.
> The PR I will create has two commits:
>  # a new unit test that shows the problem 
>  # a fix to the problem.
> The unit test is added to the apache_beam.io.filesystem_test module. It was 
> added to this module because the test: 
> apache_beam.io.textio_test.test_read_gzip_concat does not encounter the 
> problem in the Beam 2.11 and earlier code base because the test data is too 
> small: the data is smaller than read_size, so it goes through logic in the 
> code that avoids the problem in the code. So, this test sets read_size 
> smaller and test data bigger, in order to encounter the problem. It would be 
> difficult to test in the textio_test module, because you'd need very large 
> test data because default read_size is 16MiB, and the ReadFromText interface 
> does not allow you to modify the read_size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7144) Job re-scale fails on Flink 1.7

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860339#comment-16860339
 ] 

Anton Kedin commented on BEAM-7144:
---

If this is present in all Beam releases since few versions back, do we still 
want to target/block a specific Beam release and keep pushing it out?

> Job re-scale fails on Flink 1.7
> ---
>
> Key: BEAM-7144
> URL: https://issues.apache.org/jira/browse/BEAM-7144
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Jozef Vilcek
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.14.0
>
>
> I am unable to rescale job after moving it to flink runner 1.7. What I am 
> doing is:
>  # Recompile job code just with swapped flink runner version 1.5 -> 1.7
>  # Run streaming job with parallelism 112 and maxParallelism 448
>  # Wait until checkpoint is taken
>  # Stop job
>  # Run job again with parallelims 224 and checpooint path to restore from
>  # Job fails
> The same happens if I try to increase parallelims. This procedure works for 
> the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with 
> runner 1.7 on flink 1.7.2
> Exception is:
> {noformat}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the 
> 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
> at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
> at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
> ... 7 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-10 Thread Tanay Tummalapalli (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanay Tummalapalli resolved BEAM-7437.
--
Resolution: Fixed

Thank You [~kedin]
Yes, the PR merged resolved this.

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
> Fix For: 2.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

2019-06-10 Thread Anton Kedin (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-7425:
--
Description: 
While Developing my code I used the below code snippet to read the table data 
from BigQuery.

 
{code:java}
PCollection gpseEftReasonCodes = input
  .apply("Reading xxyyzz", 
  BigQueryIO
  .read(new ReadTable(ReasonCode.class))
  .withoutValidation()
  .withTemplateCompatibility()
  .fromQuery("Select * from dataset.xxyyzz")
  .usingStandardSql()
  .withCoder(SerializableCoder.of(xxyyzz.class))
{code}

Read Table Class:

{code:java}

@DefaultSchema(JavaBeanSchema.class)
public class ReadTable implements SerializableFunction {
  private static final long serialVersionUID = 1L;
  private static Gson gson = new Gson();
  public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
private final Counter countingRecords = 
  Metrics.counter(ReadTable.class, "Reading Records EFT Report");
  private Class class1;
  
  public ReadTable(Class class1) { this.class1 = class1; }
 
  public T apply(SchemaAndRecord schemaAndRecord) {
Map mapping = new HashMap<>();
int counter = 0;
try {
  GenericRecord s = schemaAndRecord.getRecord();
  org.apache.avro.Schema s1 = s.getSchema();
  for (Field f : s1.getFields()) {
counter++;
mapping.put(f.name(), null==s.get(f.name()) ? null : 
String.valueOf(s.get(counter)));
  }
  countingRecords.inc();
  JsonElement jsonElement = gson.toJsonTree(mapping);
  return gson.fromJson(jsonElement, class1);
} catch (Exception mp) {
  LOG.error("Found Wrong Mapping for the Record: "+mapping); 
mp.printStackTrace(); return null; }
}
}

{code}

So After Reading the data from Bigquery I was mapping data from SchemaAndRecord 
to pojo I was getting value for columns whose Data type is Numeric mention 
below.

{code}
last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
{code}

My Expectation was I will get exact value but getting the HyperByte Buffer the 
version I am using is Apache beam 2.12.0. If any more information is needed 
then please let me know.

Way 2 Tried:

{code:java}
GenericRecord s = schemaAndRecord.getRecord();
org.apache.avro.Schema s1 = s.getSchema();
for (Field f : s1.getFields()) {
  counter++;
  mapping.put(f.name(), null==s.get(f.name()) ? null : 
String.valueOf(s.get(counter)));
  if(f.name().equalsIgnoreCase("reason_code_id")) {
BigDecimal numericValue = new Conversions.DecimalConversion()
   .fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
s1.getLogicalType());
   System.out.println("Numeric Con"+numericValue);
} else {
  System.out.println("Else Condition "+f.name());
}
{code}

Facing Issue:

{code}
2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
RECORD
{code}
 

It would be Great if we have a method which maps all the BigQuery Data with 
Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 5 
Column then, in that case, BigQueryIO should map only that 5 Data values into 
Java Class and Rest will be Rejected As I am Doing After So much Effort. 
 Numeric Data Type must be Deserialize by itself while fetching data like 
TableRow.

 

  was:
While Developing my code I used the below code snippet to read the table data 
from BigQuery.

PCollection gpseEftReasonCodes = input. apply("Reading xxyyzz", 
BigQueryIO.read(new ReadTable(ReasonCode.class)) 
.withoutValidation().withTemplateCompatibility() .fromQuery("Select * from 
dataset.xxyyzz").usingStandardSql() 
.withCoder(SerializableCoder.of(xxyyzz.class)) 

Read Table Class:

@DefaultSchema(JavaBeanSchema.class) 

public class ReadTable implements SerializableFunction {

 private static final long serialVersionUID = 1L; 

private static Gson gson = new Gson(); 

public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
private final Counter countingRecords = 
Metrics.counter(ReadTable.class,"Reading Records EFT Report"); 

private Class class1; public ReadTable(Class class1) {

 this.class1 = class1; 

} 

public T apply(SchemaAndRecord schemaAndRecord) {

 Map mapping = new HashMap<>(); 

int counter = 0; 

try { 

GenericRecord s = schemaAndRecord.getRecord(); 

org.apache.avro.Schema s1 = s.getSchema(); 

for (Field f : s1.getFields()) { 

counter++; mapping.put(f.name(), 
null==s.get(f.name())?null:String.valueOf(s.get(counter))); 

}

 countingRecords.inc(); 

JsonElement jsonElement = gson.toJsonTree(mapping); 

return gson.fromJson(jsonElement, class1); 

}catch(Exception mp) { 

LOG.error("Found Wrong Mapping for the Record: "+mapping); 
mp.printStackTrace(); return null; } } } 

So After Reading the data from Bigquery I was mapping data from SchemaAndRecord 
to pojo I was getting value for columns whose Data ty

[jira] [Created] (BEAM-7525) support hash_fn for Python ApproximateUniqueCombineFn transform

2019-06-10 Thread Hannah Jiang (JIRA)

Hannah Jiang created BEAM-7525:
--

 Summary: support hash_fn for Python ApproximateUniqueCombineFn 
transform
 Key: BEAM-7525
 URL: https://issues.apache.org/jira/browse/BEAM-7525
 Project: Beam
  Issue Type: Task
  Components: sdk-py-core
Reporter: Hannah Jiang


ApproximateUniqueCombineFn is using default hash function for estimation.

We can pass a hash_fn and overwrite default hash function to support better 
estimation performance.

Parent issue: BEAM-6693

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257193
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:57
Start Date: 10/Jun/19 20:57
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8799: 
[BEAM-6693] replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799#discussion_r292188523
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats.py
 ##
 @@ -214,7 +212,7 @@ def create_accumulator(self, *args, **kwargs):
 
 
 Review comment:
   Ticket is created: https://issues.apache.org/jira/browse/BEAM-7525
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257193)
Time Spent: 12h 20m  (was: 12h 10m)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (BEAM-7524) Update Python dependencies page for 2.13.0

2019-06-10 Thread Rose Nguyen (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rose Nguyen closed BEAM-7524.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Update Python dependencies page for 2.13.0
> --
>
> Key: BEAM-7524
> URL: https://issues.apache.org/jira/browse/BEAM-7524
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Rose Nguyen
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.13.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7524) Update Python dependencies page for 2.13.0

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7524?focusedWorklogId=257190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257190
 ]

ASF GitHub Bot logged work on BEAM-7524:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:49
Start Date: 10/Jun/19 20:49
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8810: [BEAM-7524] 
Update Python dependencies page for 2.13.0
URL: https://github.com/apache/beam/pull/8810
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257190)
Time Spent: 0.5h  (was: 20m)

> Update Python dependencies page for 2.13.0
> --
>
> Key: BEAM-7524
> URL: https://issues.apache.org/jira/browse/BEAM-7524
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Rose Nguyen
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.13.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7524) Update Python dependencies page for 2.13.0

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7524?focusedWorklogId=257189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257189
 ]

ASF GitHub Bot logged work on BEAM-7524:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:47
Start Date: 10/Jun/19 20:47
Worklog Time Spent: 10m 
  Work Description: rosetn commented on issue #8810: [BEAM-7524] Update 
Python dependencies page for 2.13.0
URL: https://github.com/apache/beam/pull/8810#issuecomment-500588383
 
 
   @melap PTAL
   
   STAGED:
   
http://apache-beam-website-pull-requests.storage.googleapis.com/8810/documentation/sdks/python-dependencies/index.html
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257189)
Time Spent: 20m  (was: 10m)

> Update Python dependencies page for 2.13.0
> --
>
> Key: BEAM-7524
> URL: https://issues.apache.org/jira/browse/BEAM-7524
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Rose Nguyen
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.13.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257187
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:43
Start Date: 10/Jun/19 20:43
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8799: [BEAM-6693] 
replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799#discussion_r292183173
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats.py
 ##
 @@ -214,7 +212,7 @@ def create_accumulator(self, *args, **kwargs):
 
 
 Review comment:
   Suggestion (Maybe add a JIRA (does not have to be fixed in this PR) for 
future):
   - Pass a hash_fn argument to ApproximateUniqueCombineFn, that can be passed 
by users to customize the hash_fn they would like to use. It can default to 
`hash`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257187)
Time Spent: 12h 10m  (was: 12h)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7524) Update Python dependencies page for 2.13.0

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7524?focusedWorklogId=257184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257184
 ]

ASF GitHub Bot logged work on BEAM-7524:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:42
Start Date: 10/Jun/19 20:42
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #8810: [BEAM-7524] 
Update Python dependencies page for 2.13.0
URL: https://github.com/apache/beam/pull/8810
 
 
   Update Python dependencies page for 2.13.0
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](ht

[jira] [Updated] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread Kenneth Knowles (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-7511:
--
Priority: Major  (was: Minor)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
>  at 
> org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76)
>  
> This happens because in 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not 
> initialized anywhere and the method updateConsumerProperties is never called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-10 Thread Anton Kedin (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860315#comment-16860315
 ] 

Anton Kedin commented on BEAM-7437:
---

Is this fixed by [https://github.com/apache/beam/pull/8748] ?

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
> Fix For: 2.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7513) Row Estimation for BigQueryTable

2019-06-10 Thread Kenneth Knowles (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-7513:
--
Priority: Major  (was: Minor)

> Row Estimation for BigQueryTable
> 
>
> Key: BEAM-7513
> URL: https://issues.apache.org/jira/browse/BEAM-7513
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, io-java-gcp
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
>
> Calcite tables (org.apache.calcite.schema.Table) should implement the method 
> org.apache.calcite.schema.Statistic getStatistic(). The Statistic instance 
> returned by this method is used for the Volcano optimizer in Calcite. 
> Currently, org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable has not 
> implemented getStatistic() which means it uses the implementation in 
> org.apache.calcite.schema.impl.AbstractTable and that implementation just 
> returns Statistics.UNKNOWN for all sources.
>  
> Things needed to be implemented:
> 1- Implementing getStatistic in BeamCalciteTable such that it calls a row 
> count estimation method from BeamSqlTable and adding this method to 
> BeamSqlTable.
> 2- Implementing the row count estimation method for BigQueryTable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7513) Row Estimation for BigQueryTable

2019-06-10 Thread Kenneth Knowles (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-7513:
--
Status: Open  (was: Triage Needed)

> Row Estimation for BigQueryTable
> 
>
> Key: BEAM-7513
> URL: https://issues.apache.org/jira/browse/BEAM-7513
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, io-java-gcp
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
>
> Calcite tables (org.apache.calcite.schema.Table) should implement the method 
> org.apache.calcite.schema.Statistic getStatistic(). The Statistic instance 
> returned by this method is used for the Volcano optimizer in Calcite. 
> Currently, org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable has not 
> implemented getStatistic() which means it uses the implementation in 
> org.apache.calcite.schema.impl.AbstractTable and that implementation just 
> returns Statistics.UNKNOWN for all sources.
>  
> Things needed to be implemented:
> 1- Implementing getStatistic in BeamCalciteTable such that it calls a row 
> count estimation method from BeamSqlTable and adding this method to 
> BeamSqlTable.
> 2- Implementing the row count estimation method for BigQueryTable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7516) Add a watermark manager for the fn_api_runner

2019-06-10 Thread Kenneth Knowles (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-7516:
--
Status: Open  (was: Triage Needed)

> Add a watermark manager for the fn_api_runner
> -
>
> Key: BEAM-7516
> URL: https://issues.apache.org/jira/browse/BEAM-7516
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>
> To track watermarks for each stage



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread Kenneth Knowles (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-7511:
--
Status: Open  (was: Triage Needed)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
>  at 
> org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76)
>  
> This happens because in 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not 
> initialized anywhere and the method updateConsumerProperties is never called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-7502) Create ParDo Python Load Test Jenkins Job

2019-06-10 Thread Kenneth Knowles (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-7502:
--
Status: Open  (was: Triage Needed)

> Create ParDo Python Load Test Jenkins Job
> -
>
> Key: BEAM-7502
> URL: https://issues.apache.org/jira/browse/BEAM-7502
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-7524) Update Python dependencies page for 2.13.0

2019-06-10 Thread Rose Nguyen (JIRA)

Rose Nguyen created BEAM-7524:
-

 Summary: Update Python dependencies page for 2.13.0
 Key: BEAM-7524
 URL: https://issues.apache.org/jira/browse/BEAM-7524
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Rose Nguyen
Assignee: Rose Nguyen


Update Python dependencies page for 2.13.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257176
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:34
Start Date: 10/Jun/19 20:34
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8799: 
[BEAM-6693] replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799#discussion_r292179692
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats_test.py
 ##
 @@ -266,27 +251,7 @@ def 
test_approximate_unique_global_by_error_with_samll_population(self):
   >> beam.ApproximateUnique.Globally(error=est_err))
 
 assert_that(result, equal_to([actual_count]),
-label='assert:global_by_error_with_samll_population')
-pipeline.run()
-
-  def test_approximate_unique_global_by_error_with_big_population(self):
 
 Review comment:
   This test is more for estimation algorithm performance testing, rather than 
for testing functionality. With py3 default hash algorithm, it would fail 
sometimes (2%) because it is out of estimation error range, so I decided to 
remove it. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257176)
Time Spent: 12h  (was: 11h 50m)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257169
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:25
Start Date: 10/Jun/19 20:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8799: [BEAM-6693] 
replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799#discussion_r292175998
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats_test.py
 ##
 @@ -266,27 +251,7 @@ def 
test_approximate_unique_global_by_error_with_samll_population(self):
   >> beam.ApproximateUnique.Globally(error=est_err))
 
 assert_that(result, equal_to([actual_count]),
-label='assert:global_by_error_with_samll_population')
-pipeline.run()
-
-  def test_approximate_unique_global_by_error_with_big_population(self):
 
 Review comment:
   Why are you removing this test?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257169)
Time Spent: 11h 40m  (was: 11.5h)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257170
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 10/Jun/19 20:25
Start Date: 10/Jun/19 20:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8799: [BEAM-6693] replace 
mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799#issuecomment-500580749
 
 
   Thanks Hannah. This LGTM. I just left one question. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257170)
Time Spent: 11h 50m  (was: 11h 40m)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257068
 ]

ASF GitHub Bot logged work on BEAM-7450:


Author: ASF GitHub Bot
Created on: 10/Jun/19 19:01
Start Date: 10/Jun/19 19:01
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on issue #8718: [BEAM-7450] Add an 
unbounded HcatalogIO reader using splittable pardo
URL: https://github.com/apache/beam/pull/8718#issuecomment-500535619
 
 
   @iemejia - I think once this PR gets merged, the next step will be to clean 
up the multiple reads method that we currently have. Moving forward there will 
only be one Read class that can read both bounded and unbounded data depending 
on the parameters passed. Do you think we can tackle that as the next step or 
if not what would make it cleaner in this PR? 
   When I am extending from Read, I have to add the boilerplate code for the 
properties inside Read. I think that we can do in this PR is to have only 1 
Reads class and then within the expand, call the DoFn's needed for unbounded 
reads if only the right arguments are set. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257068)
Time Spent: 4h  (was: 3h 50m)

> Unbounded HCatalogIO Reader using splittable pardos
> ---
>
> Key: BEAM-7450
> URL: https://issues.apache.org/jira/browse/BEAM-7450
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> # Current version of HcatalogIO is a bounded source.
>  # While migrating our jobs to aws, we realized that it would be helpful to 
> have an unbounded hcat reader that can behave as an unbounded source and 
> polls for new partitions as and when they become available.
>  # I have used splittable pardo(s) to do this. There is a flag that can be 
> set to treat this as a bounded source which will terminate if that flag is 
> set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257067
 ]

ASF GitHub Bot logged work on BEAM-7450:


Author: ASF GitHub Bot
Created on: 10/Jun/19 19:00
Start Date: 10/Jun/19 19:00
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on issue #8718: [BEAM-7450] Add an 
unbounded HcatalogIO reader using splittable pardo
URL: https://github.com/apache/beam/pull/8718#issuecomment-500535619
 
 
   @iemejia - I think once this PR gets merged, the next step will be to clean 
up the multiple reads method that we currently have. Moving forward there will 
only be one Read class that can read both bounded and unbounded data depending 
on the parameters passed. Do you think we can tackle that as the next step or 
if not what would make it cleaner in this PR? 
   When I am extending from Read, I have to add the boilerplate code for the 
properties inside Read. I thing that we can do in this PR itself is to have 
only 1 Reads class and then within the expand, call the DoFn's needed for 
unbounded reads if only the right arguments are set. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257067)
Time Spent: 3h 50m  (was: 3h 40m)

> Unbounded HCatalogIO Reader using splittable pardos
> ---
>
> Key: BEAM-7450
> URL: https://issues.apache.org/jira/browse/BEAM-7450
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> # Current version of HcatalogIO is a bounded source.
>  # While migrating our jobs to aws, we realized that it would be helpful to 
> have an unbounded hcat reader that can behave as an unbounded source and 
> polls for new partitions as and when they become available.
>  # I have used splittable pardo(s) to do this. There is a flag that can be 
> set to treat this as a bounded source which will terminate if that flag is 
> set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7462) Add Sampled Byte Count Metric to the Java SDK

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7462?focusedWorklogId=257055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257055
 ]

ASF GitHub Bot logged work on BEAM-7462:


Author: ASF GitHub Bot
Created on: 10/Jun/19 18:31
Start Date: 10/Jun/19 18:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8809: [WIP] 
[BEAM-7462] Update java SDK to report the SampledByteCount counter.
URL: https://github.com/apache/beam/pull/8809
 
 
   Refactor code so that RehydratedComponents is constructed once and
   provided to PTransformRunnerFactories
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https:

[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257053
 ]

ASF GitHub Bot logged work on BEAM-7450:


Author: ASF GitHub Bot
Created on: 10/Jun/19 18:26
Start Date: 10/Jun/19 18:26
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on issue #8718: [BEAM-7450] Add an 
unbounded HcatalogIO reader using splittable pardo
URL: https://github.com/apache/beam/pull/8718#issuecomment-500535619
 
 
   @iemejia - I think once this PR gets merged, the next step will clean up the 
multiple reads method that we currently have. Moving forward there will only be 
one Read class that can read both bounded and unbounded data depending on the 
parameters passed. Do you think we can tackle that as the next step or if not 
what would make it cleaner in this PR? 
   When I am extending from Read, I have to add the boilerplate code for the 
properties inside Read. I thing that we can do in this PR itself is to have 
only 1 Reads class and then within the expand, call the DoFn's needed for 
unbounded reads if only the right arguments are set. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257053)
Time Spent: 3h 40m  (was: 3.5h)

> Unbounded HCatalogIO Reader using splittable pardos
> ---
>
> Key: BEAM-7450
> URL: https://issues.apache.org/jira/browse/BEAM-7450
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> # Current version of HcatalogIO is a bounded source.
>  # While migrating our jobs to aws, we realized that it would be helpful to 
> have an unbounded hcat reader that can behave as an unbounded source and 
> polls for new partitions as and when they become available.
>  # I have used splittable pardo(s) to do this. There is a flag that can be 
> set to treat this as a bounded source which will terminate if that flag is 
> set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=257051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257051
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 10/Jun/19 18:22
Start Date: 10/Jun/19 18:22
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #8690: [BEAM-7274] 
Implement the Protobuf schema provider
URL: https://github.com/apache/beam/pull/8690#issuecomment-500531285
 
 
   > Yes. I'd prefer to keep Row as a somewhat sealed class and not introduce 
extra subtypes. However there was enough other things to review in this PR that 
I figured I could review that first.
   
   Answered (and fixed comments), I also rebased the experiment to remove the 
ProtoRow as a subtype: 
https://github.com/alexvanboxel/beam/commit/1629557d6a6114b18ef4e3526b01900e5466a6e5
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257051)
Time Spent: 4h  (was: 3h 50m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7478?focusedWorklogId=257038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257038
 ]

ASF GitHub Bot logged work on BEAM-7478:


Author: ASF GitHub Bot
Created on: 10/Jun/19 17:57
Start Date: 10/Jun/19 17:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8775: [BEAM-7478] Detect 
class path from the "java.class.path" property
URL: https://github.com/apache/beam/pull/8775#issuecomment-500516613
 
 
   The stack trace didn't give me any insights on how to make this code better 
then what I had suggested before.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257038)
Time Spent: 1.5h  (was: 1h 20m)

> Remote cluster submission from Flink Runner broken due to staging issues
> 
>
> Key: BEAM-7478
> URL: https://issues.apache.org/jira/browse/BEAM-7478
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The usual way to submit pipelines with the Flink Runner is to build a fat jar 
> and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This 
> works fine.
> Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option 
> to specify a remote cluster. Upon submitting an example we get the following 
> at Flink's JobManager.
> {noformat}
> Caused by: java.lang.IllegalAccessError: class 
> sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its 
> superclass sun.reflect.SerializationConstructorAccessorImpl
>   at sun.misc.Unsafe.defineClass(Native Method)
>   at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)
>   at 
> sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)
>   at 
> sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340)
>   at 
> java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420)
>   at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
>   at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
>   at 
> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438)
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288)
>   at 
> org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:63)
>   ... 32 more
> {noformat}
> It appears there

[jira] [Created] (BEAM-7523) Add Integration Test for KafkaTable and KafkaTableProvider

2019-06-10 Thread Alireza Samadianzakaria (JIRA)

Alireza Samadianzakaria created BEAM-7523:
-

 Summary: Add Integration Test for KafkaTable and KafkaTableProvider
 Key: BEAM-7523
 URL: https://issues.apache.org/jira/browse/BEAM-7523
 Project: Beam
  Issue Type: Test
  Components: dsl-sql
Reporter: Alireza Samadianzakaria


We need integration test for KafkaTable and KfkaTableProvider in SQL module to 
create and run SQ with KafkaTable as its source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=257020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257020
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 10/Jun/19 17:35
Start Date: 10/Jun/19 17:35
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r29277
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   Thanks. Will merge this PR then if there are no further concerns
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257020)
Time Spent: 1h 50m  (was: 1h 40m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.Refere

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=257015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257015
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 10/Jun/19 17:28
Start Date: 10/Jun/19 17:28
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output 
the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-500505080
 
 
   @iemejia 
   How do you resolve the case where you want to track the timestamp of the 
file as the output timestamp instead of the boundedsources timestamp. For 
example, the 
[Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754)
 transform sets the timestamp on the output. It would make sense that a user 
may want to use the timestamp there instead of the timestamp stored within the 
boundedsource? (I think in most cases your change is the change that people 
want.)
   
   @chamikaramj 
   It is unlikely that they are different for most sources but the 
[Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754)
 transform does meaningfully set it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257015)
Time Spent: 3h 50m  (was: 3h 40m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=257014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257014
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 10/Jun/19 17:27
Start Date: 10/Jun/19 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output 
the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-500505080
 
 
   @iemejia 
   How do you resolve the case where you want to track the timestamp of the 
file as the output timestamp instead of the boundedsources timestamp. For 
example, the 
[Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754)
 transform sets the timestamp on the output. It would make sense that a user 
may want to use the timestamp there instead of the timestamp stored within the 
boundedsource. I think in most cases your change is the change that people want.
   
   @chamikaramj 
   It is unlikely that they are different for most sources but the 
[Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754)
 transform does meaningfully set it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257014)
Time Spent: 3h 40m  (was: 3.5h)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257012
 ]

ASF GitHub Bot logged work on BEAM-7450:


Author: ASF GitHub Bot
Created on: 10/Jun/19 17:26
Start Date: 10/Jun/19 17:26
Worklog Time Spent: 10m 
  Work Description: jhalaria commented on pull request #8718: [BEAM-7450] 
Add an unbounded HcatalogIO reader using splittable pardo
URL: https://github.com/apache/beam/pull/8718#discussion_r292107723
 
 

 ##
 File path: 
sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/PartitionPollerFn.java
 ##
 @@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.hcatalog;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.io.hcatalog.HCatalogIO.Read;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement;
+import org.apache.beam.sdk.transforms.SerializableComparator;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hive.hcatalog.common.HCatUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Unbounded poller to listen for new partitions. */
+@UnboundedPerElement
+class PartitionPollerFn extends DoFn {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(PartitionPollerFn.class);
+  private transient IMetaStoreClient metaStoreClient;
+  private Map configProperties;
+  private String database;
+  private String table;
+  final Boolean shouldTreatSourceAsBounded;
+
+  public PartitionPollerFn(
+  final Map properties,
+  final String database,
+  final String table,
+  final Boolean isBounded) {
+this.configProperties = properties;
+this.database = database;
+this.table = table;
+this.shouldTreatSourceAsBounded = isBounded;
+  }
+
+  /** Outputs newer partitions to be processed by the reader. */
+  @ProcessElement
+  @SuppressWarnings("unused")
+  public ProcessContinuation processElement(
+  final ProcessContext c, RestrictionTracker 
tracker) {
+final Read readRequest = c.element();
+final PartitionRange range = tracker.currentRestriction();
+final ImmutableList allAvailablePartitions = 
range.getPartitions();
+
+List forSort = new ArrayList<>(allAvailablePartitions);
+Collections.sort(forSort, readRequest.getPartitionComparator());
+
+int trueStart;
+if (tracker.currentRestriction().getLastCompletedPartition() != null) {
+  final int indexOfLastCompletedPartition =
+  
forSort.indexOf(tracker.currentRestriction().getLastCompletedPartition());
+  trueStart = indexOfLastCompletedPartition + 1;
+} else {
+  trueStart = 0;
+}
+
+for (int i = trueStart; i < forSort.size(); i++) {
+  if (tracker.tryClaim(forSort.get(i))) {
+c.output(
+new HCatRecordReaderFn.PartitionWrapper(
 
 Review comment:
   Good point. After the refactor, since everything is now in Read(s), this can 
go away.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 257012)
Time Spent: 3.5h  (was: 3h 20m)

> Unbounded HCatalogIO Reader using splittable pardos
> ---
>
> Key: BEAM-7450
> URL: https://issues.apache.org/jira/browse/BEAM-7450
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>

[jira] [Comment Edited] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"

2019-06-10 Thread Paul (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860164#comment-16860164
 ] 

Paul edited comment on BEAM-6860 at 6/10/19 5:08 PM:
-

FWIW I'm having this same issue on apache_beam 2.10, 2.11, 2.12, 2.13 python 
sdk; I confirmed that this works on 2.9.0 as well, which I saw in a stack 
overflow thread here 
([https://stackoverflow.com/questions/55109403/apache-beam-python-sdk-upgrade-issue).]

It also fails not only on WriteToText, but also WriteToBigQuery. Specifically, 
this does not work when implementing windowing on a bounded data set and adding 
element timestamps. While doing this, I was working off of this example: 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py]

However, this example actually works in later versions of the beam python SDK, 
so I'm wondering if this is a user-issue with a very poor error message or if 
we're explicitly/implicitly using a feature of beam that the example does not.

 

Environment was:

centos7 linux, kernel 3.10

python 2.7.16

apache beam versions: 2.9.0-2.13.0


was (Author: pshahid):
FWIW I'm having this same issue on apache_beam 2.10, 2.11, 2.12, 2.13 python 
sdk; I confirmed that this works on 2.9.0 as well, which I saw in a stack 
overflow thread here 
([https://stackoverflow.com/questions/55109403/apache-beam-python-sdk-upgrade-issue).]

It also fails not only on WriteToText, but also WriteToBigQuery. Specifically, 
this does not work when implementing windowing on a bounded data set and adding 
element timestamps. While doing this, I was working off of this example: 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py]

However, this example actually works in later versions of the beam python SDK, 
so I'm wondering if this is a user-issue with a very poor error message or if 
we're explicitly/implicitly using a feature of beam that the example does not.

> WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
> -
>
> Key: BEAM-6860
> URL: https://issues.apache.org/jira/browse/BEAM-6860
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.11.0
> Environment: macOS, DirectRunner, python 2.7.15 via 
> pyenv/pyenv-virtualenv
>Reporter: Henrik
>Priority: Major
>  Labels: newbie
>
> Main error:
> > Cannot convert GlobalWindow to 
> > apache_beam.utils.windowed_value._IntervalWindowBase
> This is very hard for me to debug. Doing a DoPar call before, printing the 
> input, gives me just what I want; so the lines of data to serialise are 
> "alright"; just JSON strings, in fact.
> Stacktrace:
> {code:java}
> Traceback (most recent call last):
>   File "./okr_end_ride.py", line 254, in 
>     run()
>   File "./okr_end_ride.py", line 250, in run
>     run_pipeline(pipeline_options, known_args)
>   File "./okr_end_ride.py", line 198, in run_pipeline
>     | 'write_all' >> WriteToText(known_args.output, 
> file_name_suffix=".txt")
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>     self.run().wait_until_finish()
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 406, in run
>     self._options).run(False)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 419, in run
>     return self.runner.run_pipeline(self, self._options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 132, in run_pipeline
>     return runner.run_pipeline(pipeline, options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 275, in run_pipeline
>     default_environment=self._default_environment))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 278, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 354, in run_stages
>     stage_context.safe_coders)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 509, in run_stage
>     data_input, data_output)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/l

[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"

2019-06-10 Thread Paul (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860164#comment-16860164
 ] 

Paul commented on BEAM-6860:


FWIW I'm having this same issue on apache_beam 2.10, 2.11, 2.12, 2.13 python 
sdk; I confirmed that this works on 2.9.0 as well, which I saw in a stack 
overflow thread here 
([https://stackoverflow.com/questions/55109403/apache-beam-python-sdk-upgrade-issue).]

It also fails not only on WriteToText, but also WriteToBigQuery. Specifically, 
this does not work when implementing windowing on a bounded data set and adding 
element timestamps. While doing this, I was working off of this example: 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py]

However, this example actually works in later versions of the beam python SDK, 
so I'm wondering if this is a user-issue with a very poor error message or if 
we're explicitly/implicitly using a feature of beam that the example does not.

> WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
> -
>
> Key: BEAM-6860
> URL: https://issues.apache.org/jira/browse/BEAM-6860
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.11.0
> Environment: macOS, DirectRunner, python 2.7.15 via 
> pyenv/pyenv-virtualenv
>Reporter: Henrik
>Priority: Major
>  Labels: newbie
>
> Main error:
> > Cannot convert GlobalWindow to 
> > apache_beam.utils.windowed_value._IntervalWindowBase
> This is very hard for me to debug. Doing a DoPar call before, printing the 
> input, gives me just what I want; so the lines of data to serialise are 
> "alright"; just JSON strings, in fact.
> Stacktrace:
> {code:java}
> Traceback (most recent call last):
>   File "./okr_end_ride.py", line 254, in 
>     run()
>   File "./okr_end_ride.py", line 250, in run
>     run_pipeline(pipeline_options, known_args)
>   File "./okr_end_ride.py", line 198, in run_pipeline
>     | 'write_all' >> WriteToText(known_args.output, 
> file_name_suffix=".txt")
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>     self.run().wait_until_finish()
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 406, in run
>     self._options).run(False)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 419, in run
>     return self.runner.run_pipeline(self, self._options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 132, in run_pipeline
>     return runner.run_pipeline(pipeline, options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 275, in run_pipeline
>     default_environment=self._default_environment))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 278, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 354, in run_stages
>     stage_context.safe_coders)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 509, in run_stage
>     data_input, data_output)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1206, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 821, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 265, in do_instruction
>     request.instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 281, in process_bundle
>     delayed_applications = bundle_processor.process_bundle(instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 552, in process_bun

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256996
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:59
Start Date: 10/Jun/19 16:59
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r292097650
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   surely, create https://issues.apache.org/jira/browse/BEAM-7522 to track.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256996)
Time Spent: 1h 40m  (was: 1.5h)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stre

[jira] [Created] (BEAM-7522) support customized configuration in KafkaTableProvider

2019-06-10 Thread Xu Mingmin (JIRA)

Xu Mingmin created BEAM-7522:


 Summary: support customized configuration in KafkaTableProvider
 Key: BEAM-7522
 URL: https://issues.apache.org/jira/browse/BEAM-7522
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Xu Mingmin


expand KafkaTableProvider to support 
{{BeamKafkaTable.updateConsumerProperties(...)}}, so users can add customized 
configurations in DDL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7043) Add DynamoDBIO

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256979
 ]

ASF GitHub Bot logged work on BEAM-7043:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:49
Start Date: 10/Jun/19 16:49
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #8390:  
[BEAM-7043] Add DynamoDBIO
URL: https://github.com/apache/beam/pull/8390#discussion_r292093933
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java
 ##
 @@ -0,0 +1,501 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws.dynamodb;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
+import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException;
+import com.amazonaws.services.dynamodbv2.model.AttributeValue;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult;
+import com.amazonaws.services.dynamodbv2.model.ScanRequest;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Map;
+import java.util.function.Predicate;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
+import org.apache.http.HttpStatus;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB.
+ *
+ * Writing to DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "region", "accessKey", "secretKey");
+ * PCollection data = ...;
+ *
+ * data.apply(DynamoDBIO.write()
+ * .withRetryConfiguration(
+ *DynamoDBIO.RetryConfiguration.create(
+ *  4, org.joda.time.Duration.standardSeconds(10)))
+ * .withDynamoDBConfiguration(config)
+ * .withResultOutputTag(results));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   Retry configuration
+ *   DynamoDb configuration
+ *   An output tag where you can get results. Example in DynamoDBIOTest
+ * 
+ *
+ * Reading from DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "endpointUrl", "region", "accessKey", "secretKey");
+ * PCollection> actual =
+ * pipeline.apply(DynamoDBIO.read()
+ * .withScanRequestFn((v) -> new 
ScanRequest(tableName).withTotalSegment(10))
+ * .withDynamoDBConfiguration(config));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   DynamoDb configuration
+ *   ScanReq

[jira] [Work logged] (BEAM-7043) Add DynamoDBIO

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256961
 ]

ASF GitHub Bot logged work on BEAM-7043:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:42
Start Date: 10/Jun/19 16:42
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #8390:  
[BEAM-7043] Add DynamoDBIO
URL: https://github.com/apache/beam/pull/8390#discussion_r292090983
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java
 ##
 @@ -0,0 +1,501 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws.dynamodb;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
+import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException;
+import com.amazonaws.services.dynamodbv2.model.AttributeValue;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult;
+import com.amazonaws.services.dynamodbv2.model.ScanRequest;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Map;
+import java.util.function.Predicate;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
+import org.apache.http.HttpStatus;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB.
+ *
+ * Writing to DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "region", "accessKey", "secretKey");
+ * PCollection data = ...;
+ *
+ * data.apply(DynamoDBIO.write()
+ * .withRetryConfiguration(
+ *DynamoDBIO.RetryConfiguration.create(
+ *  4, org.joda.time.Duration.standardSeconds(10)))
+ * .withDynamoDBConfiguration(config)
+ * .withResultOutputTag(results));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   Retry configuration
+ *   DynamoDb configuration
+ *   An output tag where you can get results. Example in DynamoDBIOTest
+ * 
+ *
+ * Reading from DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "endpointUrl", "region", "accessKey", "secretKey");
+ * PCollection> actual =
+ * pipeline.apply(DynamoDBIO.read()
+ * .withScanRequestFn((v) -> new 
ScanRequest(tableName).withTotalSegment(10))
+ * .withDynamoDBConfiguration(config));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   DynamoDb configuration
+ *   ScanReq

[jira] [Work logged] (BEAM-7043) Add DynamoDBIO

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256964
 ]

ASF GitHub Bot logged work on BEAM-7043:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:42
Start Date: 10/Jun/19 16:42
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8390:  [BEAM-7043] Add 
DynamoDBIO
URL: https://github.com/apache/beam/pull/8390#issuecomment-500488384
 
 
   @cmachgodaddy I think I answered all the pending questions. Please write me 
if you still have questions or comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256964)
Time Spent: 15h  (was: 14h 50m)

> Add DynamoDBIO
> --
>
> Key: BEAM-7043
> URL: https://issues.apache.org/jira/browse/BEAM-7043
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-aws
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Currently we don't have any feature to write data to AWS DynamoDB. This 
> feature will enable us to send data to DynamoDB



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7043) Add DynamoDBIO

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256959
 ]

ASF GitHub Bot logged work on BEAM-7043:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:41
Start Date: 10/Jun/19 16:41
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8390:  [BEAM-7043] 
Add DynamoDBIO
URL: https://github.com/apache/beam/pull/8390#discussion_r292090757
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java
 ##
 @@ -0,0 +1,501 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws.dynamodb;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
+import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException;
+import com.amazonaws.services.dynamodbv2.model.AttributeValue;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult;
+import com.amazonaws.services.dynamodbv2.model.ScanRequest;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Map;
+import java.util.function.Predicate;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
+import org.apache.http.HttpStatus;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB.
+ *
+ * Writing to DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "region", "accessKey", "secretKey");
+ * PCollection data = ...;
+ *
+ * data.apply(DynamoDBIO.write()
+ * .withRetryConfiguration(
+ *DynamoDBIO.RetryConfiguration.create(
+ *  4, org.joda.time.Duration.standardSeconds(10)))
+ * .withDynamoDBConfiguration(config)
+ * .withResultOutputTag(results));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   Retry configuration
+ *   DynamoDb configuration
+ *   An output tag where you can get results. Example in DynamoDBIOTest
+ * 
+ *
+ * Reading from DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "endpointUrl", "region", "accessKey", "secretKey");
+ * PCollection> actual =
+ * pipeline.apply(DynamoDBIO.read()
+ * .withScanRequestFn((v) -> new 
ScanRequest(tableName).withTotalSegment(10))
+ * .withDynamoDBConfiguration(config));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   DynamoDb configuration
+ *   ScanRequestF

[jira] [Work logged] (BEAM-7043) Add DynamoDBIO

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256967
 ]

ASF GitHub Bot logged work on BEAM-7043:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:42
Start Date: 10/Jun/19 16:42
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #8390:  
[BEAM-7043] Add DynamoDBIO
URL: https://github.com/apache/beam/pull/8390#discussion_r292090983
 
 

 ##
 File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java
 ##
 @@ -0,0 +1,501 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.aws.dynamodb;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import com.amazonaws.auth.AWSStaticCredentialsProvider;
+import com.amazonaws.auth.BasicAWSCredentials;
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
+import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException;
+import com.amazonaws.services.dynamodbv2.model.AttributeValue;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest;
+import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult;
+import com.amazonaws.services.dynamodbv2.model.ScanRequest;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Map;
+import java.util.function.Predicate;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet;
+import org.apache.http.HttpStatus;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB.
+ *
+ * Writing to DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "region", "accessKey", "secretKey");
+ * PCollection data = ...;
+ *
+ * data.apply(DynamoDBIO.write()
+ * .withRetryConfiguration(
+ *DynamoDBIO.RetryConfiguration.create(
+ *  4, org.joda.time.Duration.standardSeconds(10)))
+ * .withDynamoDBConfiguration(config)
+ * .withResultOutputTag(results));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   Retry configuration
+ *   DynamoDb configuration
+ *   An output tag where you can get results. Example in DynamoDBIOTest
+ * 
+ *
+ * Reading from DynamoDB
+ *
+ * Example usage:
+ *
+ * {@code
+ * DynamoDBIO.DynamoDBConfiguration config = 
DynamoDBIO.DynamoDBConfiguration.create(
+ * "endpointUrl", "region", "accessKey", "secretKey");
+ * PCollection> actual =
+ * pipeline.apply(DynamoDBIO.read()
+ * .withScanRequestFn((v) -> new 
ScanRequest(tableName).withTotalSegment(10))
+ * .withDynamoDBConfiguration(config));
+ * }
+ *
+ * As a client, you need to provide at least the following things:
+ *
+ * 
+ *   DynamoDb configuration
+ *   ScanReq

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256969
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:43
Start Date: 10/Jun/19 16:43
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r292091591
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   @XuMingmin thank you for pointing it out. I think exposing these methods in 
`KafkaTableProvider` is probably out of scope of this PR. These methods also 
would not be used in the existing code. If you rely on these in some customized 
version, can you contribute some of it back with tests and maybe examples? So 
that it doesn't get deleted by accident?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256969)
Time Spent: 1.5h  (was: 1h 20m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipe

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256954
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:35
Start Date: 10/Jun/19 16:35
Worklog Time Spent: 10m 
  Work Description: riazela commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r292088562
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   I returned the deleted methods back so that we can support it later in 
`KafkaTableProvider`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256954)
Time Spent: 1h 20m  (was: 1h 10m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

[jira] [Work logged] (BEAM-7368) Run Python GBK load tests on portable Flink runner

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7368?focusedWorklogId=256950&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256950
 ]

ASF GitHub Bot logged work on BEAM-7368:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:32
Start Date: 10/Jun/19 16:32
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #8636: [BEAM-7368] Flink + 
Python + gbk load test
URL: https://github.com/apache/beam/pull/8636#issuecomment-500484640
 
 
   @mxm @angoenka I provided fixes. Could you take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256950)
Time Spent: 10h 50m  (was: 10h 40m)

> Run Python GBK load tests on portable Flink runner
> --
>
> Key: BEAM-7368
> URL: https://issues.apache.org/jira/browse/BEAM-7368
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-10 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=256951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256951
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 10/Jun/19 16:32
Start Date: 10/Jun/19 16:32
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8741: [BEAM-7428] 
Output the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-500484673
 
 
   I'm not sure where the difference in behavior comes from but is it possible 
that input timestamps in two cases are different somehow (-inf vs current time) 
?
   
   @jkff or @lukecwik might have more context.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256951)
Time Spent: 3.5h  (was: 3h 20m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 >

1 - 100 of 465 matches

Mail list logo