[jira] [Resolved] (BEAM-1410) Reduce sdk-py DirectRunner running time and memory consumption
[ https://issues.apache.org/jira/browse/BEAM-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Younghee Kwon resolved BEAM-1410. - Resolution: Fixed Fix Version/s: 0.6.0 > Reduce sdk-py DirectRunner running time and memory consumption > -- > > Key: BEAM-1410 > URL: https://issues.apache.org/jira/browse/BEAM-1410 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay >Priority: Minor > Labels: performance, python > Fix For: 0.6.0 > > > Some experimental benchmarks shows that DirectRunner can improve performance > in cpu and memory. > I will roll out some CLs to improve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default
[ https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Younghee Kwon closed BEAM-1496. --- Resolution: Not A Problem Fix Version/s: Not applicable > pysdk's sideinputs_test requires nose, but not installed by default > --- > > Key: BEAM-1496 > URL: https://issues.apache.org/jira/browse/BEAM-1496 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay >Priority: Minor > Fix For: Not applicable > > > $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test > > No handlers could be found for logger "oauth2client.contrib.multistore_file" > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py", > line 23, in > from nose.plugins.attrib import attr > ImportError: No module named nose.plugins.attrib -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default
[ https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869200#comment-15869200 ] Younghee Kwon commented on BEAM-1496: - I see; sorry for the noise. I thought it might have broken the automated tests, but I confirmed that travis-ci passes. Closing.. > pysdk's sideinputs_test requires nose, but not installed by default > --- > > Key: BEAM-1496 > URL: https://issues.apache.org/jira/browse/BEAM-1496 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay >Priority: Minor > > $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test > > No handlers could be found for logger "oauth2client.contrib.multistore_file" > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py", > line 23, in > from nose.plugins.attrib import attr > ImportError: No module named nose.plugins.attrib -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default
[ https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868926#comment-15868926 ] Younghee Kwon commented on BEAM-1496: - I could add nose to setup.py, but the notice in the site discourages me.. https://nose.readthedocs.io/en/latest/ Note to Users Nose has been in maintenance mode for the past several years and will likely cease without a new person/team to take over maintainership. New projects should consider using Nose2, py.test, or just plain unittest/unittest2. > pysdk's sideinputs_test requires nose, but not installed by default > --- > > Key: BEAM-1496 > URL: https://issues.apache.org/jira/browse/BEAM-1496 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay >Priority: Minor > > $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test > > No handlers could be found for logger "oauth2client.contrib.multistore_file" > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py", > line 23, in > from nose.plugins.attrib import attr > ImportError: No module named nose.plugins.attrib -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default
Younghee Kwon created BEAM-1496: --- Summary: pysdk's sideinputs_test requires nose, but not installed by default Key: BEAM-1496 URL: https://issues.apache.org/jira/browse/BEAM-1496 Project: Beam Issue Type: Bug Components: sdk-py Reporter: Younghee Kwon Assignee: Ahmet Altay Priority: Minor $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test No handlers could be found for logger "oauth2client.contrib.multistore_file" Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py", line 23, in from nose.plugins.attrib import attr ImportError: No module named nose.plugins.attrib -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-588) All runners should support ProfilingOptions
[ https://issues.apache.org/jira/browse/BEAM-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856468#comment-15856468 ] Younghee Kwon commented on BEAM-588: The PR is about to be merged. Several things to do in a successive PR: 1. integrate the memory reporter into DirectRunner using PipelineOptions 2. having an option to dump full profile into disk (as opposed to only logging the biggest 10 as now). 3. (optional) experiment with other profilers for the platforms that guppy is not available. > All runners should support ProfilingOptions > --- > > Key: BEAM-588 > URL: https://issues.apache.org/jira/browse/BEAM-588 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Minor > > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/options.py#L366 > This is useful for profiling pipelines in different environments. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1410) Reduce sdk-py DirectRunner running time and memory consumption
Younghee Kwon created BEAM-1410: --- Summary: Reduce sdk-py DirectRunner running time and memory consumption Key: BEAM-1410 URL: https://issues.apache.org/jira/browse/BEAM-1410 Project: Beam Issue Type: Improvement Components: sdk-py Reporter: Younghee Kwon Assignee: Ahmet Altay Priority: Minor Some experimental benchmarks shows that DirectRunner can improve performance in cpu and memory. I will roll out some CLs to improve them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (BEAM-1246) Update README.md to remove incubating notion
[ https://issues.apache.org/jira/browse/BEAM-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Younghee Kwon closed BEAM-1246. --- Resolution: Fixed Fix Version/s: Not applicable PR merged. > Update README.md to remove incubating notion > > > Key: BEAM-1246 > URL: https://issues.apache.org/jira/browse/BEAM-1246 > Project: Beam > Issue Type: Task > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay >Priority: Trivial > Labels: documentation > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)
[ https://issues.apache.org/jira/browse/BEAM-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Younghee Kwon closed BEAM-1233. --- > Implement TFRecordIO (Reading/writing Tensorflow Standard format) > - > > Key: BEAM-1233 > URL: https://issues.apache.org/jira/browse/BEAM-1233 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay > Fix For: Not applicable > > > Tensorflow is an open source Machine Learning project, which is getting lots > of attention these days. Apache Beam can be used as a good preprocessing tool > for this Machine Learning tool, however Tensorflow supports limited number of > input file formats -- only csv and its own record format (so called TFRecord). > On the other hand, Apache Beam doesn't support reading/writing in TFRecord > format. This would be useful once it supports TFRecordIO natively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-1245) Use @unittest.skip instead of try/except in avroio_test
[ https://issues.apache.org/jira/browse/BEAM-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Younghee Kwon closed BEAM-1245. --- Resolution: Fixed Fix Version/s: Not applicable PR 1736 merged. > Use @unittest.skip instead of try/except in avroio_test > --- > > Key: BEAM-1245 > URL: https://issues.apache.org/jira/browse/BEAM-1245 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay >Priority: Minor > Fix For: Not applicable > > > As said in the summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-1233) Implement TFRecordIO (Reading/writing Tensorflow Standard format)
[ https://issues.apache.org/jira/browse/BEAM-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Younghee Kwon resolved BEAM-1233. - Resolution: Fixed Fix Version/s: Not applicable The PR that adds TFRecordIO is pushed to python-sdk branch. > Implement TFRecordIO (Reading/writing Tensorflow Standard format) > - > > Key: BEAM-1233 > URL: https://issues.apache.org/jira/browse/BEAM-1233 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Younghee Kwon >Assignee: Ahmet Altay > Fix For: Not applicable > > > Tensorflow is an open source Machine Learning project, which is getting lots > of attention these days. Apache Beam can be used as a good preprocessing tool > for this Machine Learning tool, however Tensorflow supports limited number of > input file formats -- only csv and its own record format (so called TFRecord). > On the other hand, Apache Beam doesn't support reading/writing in TFRecord > format. This would be useful once it supports TFRecordIO natively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1246) Update README.md to remove incubating notion
Younghee Kwon created BEAM-1246: --- Summary: Update README.md to remove incubating notion Key: BEAM-1246 URL: https://issues.apache.org/jira/browse/BEAM-1246 Project: Beam Issue Type: Task Components: sdk-py Reporter: Younghee Kwon Assignee: Ahmet Altay Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1245) Use @unittest.skip instead of try/except in avroio_test
Younghee Kwon created BEAM-1245: --- Summary: Use @unittest.skip instead of try/except in avroio_test Key: BEAM-1245 URL: https://issues.apache.org/jira/browse/BEAM-1245 Project: Beam Issue Type: Improvement Components: sdk-py Reporter: Younghee Kwon Assignee: Ahmet Altay Priority: Minor As said in the summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)