[jira] [Updated] (BEAM-2395) BigtableIO for Python SDK

2017-06-01 Thread Matthias Baetens (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Baetens updated BEAM-2395:
---
Summary: BigtableIO for Python SDK  (was: BigTableIO for Python SDK)

> BigtableIO for Python SDK
> -
>
> Key: BEAM-2395
> URL: https://issues.apache.org/jira/browse/BEAM-2395
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Matthias Baetens
>Assignee: Matthias Baetens
>  Labels: features
>
> Developing a read and write IO for BigTable for the Python SDK. 
> Working / design document can be found here: 
> https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2395) BigTableIO for Python SDK

2017-06-01 Thread Matthias Baetens (JIRA)
Matthias Baetens created BEAM-2395:
--

 Summary: BigTableIO for Python SDK
 Key: BEAM-2395
 URL: https://issues.apache.org/jira/browse/BEAM-2395
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Matthias Baetens
Assignee: Ahmet Altay


Developing a read and write IO for BigTable for the Python SDK. 

Working / design document can be found here: 
https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2122) Writing to partitioned BigQuery tables from Dataflow is causing errors

2017-04-28 Thread Matthias Baetens (JIRA)
Matthias Baetens created BEAM-2122:
--

 Summary: Writing to partitioned BigQuery tables from Dataflow is 
causing errors
 Key: BEAM-2122
 URL: https://issues.apache.org/jira/browse/BEAM-2122
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
 Environment: Running with Beam 0.7.0-SNAPSHOT version 48 for 
beam-sdks-java-io-google-cloud-platform, 49 for beam-sdks-java-core and 
beam-runners-google-cloud-dataflow-java in Eclipse using Dataflow service.
Reporter: Matthias Baetens
Assignee: Daniel Halperin


Using the latest Beam SNAPSHOT which has a new BigQuery connector and trying to 
write to partitioned tables according to the docs (or this Stackoverflow 
question 
http://stackoverflow.com/questions/43505534/writing-different-values-to-different-bigquery-tables-in-apache-beam/43655461#43655461):

static class PartitionedTableGeneration
implements 
SerializableFunction, TableDestination> {
@ProcessElement
public TableDestination apply(ValueInSingleWindow 
value) {
// String dayString =
// 
DateTimeFormat.forPattern("_MM_dd").withZone(DateTimeZone.UTC)
String dayString = 
DateTimeFormat.forPattern("MMdd").withZone(DateTimeZone.UTC)
.print(((IntervalWindow) 
value.getWindow()).start());
TableDestination td = new TableDestination(
"projecet:dataset.table + '$' 
dayString, "");
return td;
}
}

causes the following issues when running (depending on the specification of the 
dayString):

1. "Invalid table ID \"partitioned_sample$20150905\". Table IDs must be 
alphanumeric (plus underscores) and must be at most 1024 characters long. Also, 
Table decorators cannot be used.",
 2. java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: 
java.lang.RuntimeException: Failed to create load job with id prefix 
...
"errorResult" : {
  "message" : "Invalid date partitioned table suffix: 2015_11_26",
  "reason" : "invalid"
}

Writing to sharded tables (without the '$'-sign) is working fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-842) dependency.py: package not found when running on Windows

2017-04-18 Thread Matthias Baetens (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973381#comment-15973381
 ] 

Matthias Baetens edited comment on BEAM-842 at 4/18/17 8:04 PM:


[~vectorijk], [~altay] sorry for the late reply, lost track of this. The 
problem was solved soon after I ran into it. Shall I close the issue?


was (Author: matthiasa4):
[vectorijk], [~altay] sorry for the late reply, lost track of this. The problem 
was solved soon after I ran into it. Shall I close the issue?

> dependency.py: package not found when running on Windows
> 
>
> Key: BEAM-842
> URL: https://issues.apache.org/jira/browse/BEAM-842
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
> Environment: Windows 10, Python 2.7.11
>Reporter: Matthias Baetens
>Priority: Minor
>  Labels: newbie
>
> When having splitting your pipeline into multiple files and configuring your 
> project according to the Juliaset example 
> (https://cloud.google.com/dataflow/pipelines/dependencies-python#multiple-file-dependencies),
>  the Pipeline still crashes when using Windows.
> This is caused by setuptools defaulting to a .zip on Windows, and the current 
> Beam code looks for a .tar.gz (dependency.py, line 400). When changing this 
> line to: output_files = glob.glob(os.path.join(temp_dir, '*.zip')), it works. 
> Suggestion: checking the OS would probably solve this issue. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-842) dependency.py: package not found when running on Windows

2017-04-18 Thread Matthias Baetens (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973381#comment-15973381
 ] 

Matthias Baetens commented on BEAM-842:
---

[vectorijk], [~altay] sorry for the late reply, lost track of this. The problem 
was solved soon after I ran into it. Shall I close the issue?

> dependency.py: package not found when running on Windows
> 
>
> Key: BEAM-842
> URL: https://issues.apache.org/jira/browse/BEAM-842
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
> Environment: Windows 10, Python 2.7.11
>Reporter: Matthias Baetens
>Priority: Minor
>  Labels: newbie
>
> When having splitting your pipeline into multiple files and configuring your 
> project according to the Juliaset example 
> (https://cloud.google.com/dataflow/pipelines/dependencies-python#multiple-file-dependencies),
>  the Pipeline still crashes when using Windows.
> This is caused by setuptools defaulting to a .zip on Windows, and the current 
> Beam code looks for a .tar.gz (dependency.py, line 400). When changing this 
> line to: output_files = glob.glob(os.path.join(temp_dir, '*.zip')), it works. 
> Suggestion: checking the OS would probably solve this issue. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)