[ 
https://issues.apache.org/jira/browse/BEAM-13421?focusedWorklogId=697528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-697528
 ]

ASF GitHub Bot logged work on BEAM-13421:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Dec/21 21:57
            Start Date: 16/Dec/21 21:57
    Worklog Time Spent: 10m 
      Work Description: codecov[bot] edited a comment on pull request #16258:
URL: https://github.com/apache/beam/pull/16258#issuecomment-996201095


   # 
[Codecov](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#16258](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (67818cd) into 
[master](https://codecov.io/gh/apache/beam/commit/15048929495ad66963b528d5bd71eb7b4a844c96?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (1504892) will **increase** coverage by `37.52%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/16258/graphs/tree.svg?width=650&height=150&src=pr&token=qcbbAh8Fj1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #16258       +/-   ##
   ===========================================
   + Coverage   46.13%   83.66%   +37.52%     
   ===========================================
     Files         197      447      +250     
     Lines       19519    61705    +42186     
   ===========================================
   + Hits         9006    51626    +42620     
   - Misses       9542    10079      +537     
   + Partials      971        0      -971     
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/dataframe/frames.py](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZGF0YWZyYW1lL2ZyYW1lcy5weQ==)
 | `94.90% <100.00%> (ø)` | |
   | 
[sdks/go/pkg/beam/provision/provision.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9wcm92aXNpb24vcHJvdmlzaW9uLmdv)
 | | |
   | 
[sdks/go/pkg/beam/core/graph/scope.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL2dyYXBoL3Njb3BlLmdv)
 | | |
   | 
[sdks/go/pkg/beam/core/util/reflectx/structs.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3V0aWwvcmVmbGVjdHgvc3RydWN0cy5nbw==)
 | | |
   | 
[...pkg/beam/runners/dataflow/dataflowlib/translate.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9ydW5uZXJzL2RhdGFmbG93L2RhdGFmbG93bGliL3RyYW5zbGF0ZS5nbw==)
 | | |
   | 
[sdks/go/pkg/beam/core/runtime/graphx/dataflow.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZ3JhcGh4L2RhdGFmbG93Lmdv)
 | | |
   | 
[sdks/go/pkg/beam/metrics.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9tZXRyaWNzLmdv)
 | | |
   | 
[sdks/go/pkg/beam/core/runtime/metricsx/urns.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvbWV0cmljc3gvdXJucy5nbw==)
 | | |
   | 
[sdks/go/pkg/beam/artifact/materialize.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9hcnRpZmFjdC9tYXRlcmlhbGl6ZS5nbw==)
 | | |
   | 
[sdks/go/pkg/beam/testing/passert/equals.go](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS90ZXN0aW5nL3Bhc3NlcnQvZXF1YWxzLmdv)
 | | |
   | ... and [635 
more](https://codecov.io/gh/apache/beam/pull/16258/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | |
   
   ------
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Last update 
[1504892...67818cd](https://codecov.io/gh/apache/beam/pull/16258?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
 Read the [comment 
docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 697528)
    Time Spent: 0.5h  (was: 20m)

> Python DeferredDataFrame.xs differs from Pandas
> -----------------------------------------------
>
>                 Key: BEAM-13421
>                 URL: https://issues.apache.org/jira/browse/BEAM-13421
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-dataframe
>    Affects Versions: 2.34.0
>         Environment: Tested in Jupyter Notebook running in Docker.
> The docker file is produced by a modified version of 
> https://github.com/fozziethebeat/gpu-jupyter/blob/master/.build/Dockerfile
>            Reporter: Keith Stevens
>            Assignee: Brian Hulette
>            Priority: P2
>             Fix For: 2.36.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When testing the `xs` method on DeferredDataFrames I'm seeing a few 
> inconsistent results.  I have two minimal examples that showcase the errors.
>  
> First inconsistency: Beam's `xs` requries one left over index field while 
> Pandas does not.
> {code:java}
> with beam.Pipeline(options=PipelineOptions()) as pipeline:
>     df = pd.DataFrame(
>         np.array([
>             ['state', 'day1', 12],
>             ['state', 'day1', 1],
>             ['state', 'day2', 14],
>             ['county', 'day1', 9],
>         ]),
>         columns=['provider', 'time', 'value'])
>     # Create just one index field
>     df = df.set_index(['provider'])
>     df.to_parquet('test.parquet')
>     
>     # Should print out
>     #           time value
>     # provider            
>     # state     day1    12
>     # state     day1     1
>     # state     day2    14
>     print(df.xs('state'))
>     
>     # Should emit the same data to a csv but instead dies due to
>     # Cannot remove 1 levels from an index with 1 levels: at least one level 
> must be left.
>     test_df = (pipeline | read_parquet('test.parquet'))
>     (
>         test_df.xs('state').to_csv('test.csv')
>     ) {code}
> Second inconsistency: Beam dies for no clear reason
> {code:java}
> import pandas as pd
> import numpy as npwith beam.Pipeline(options=PipelineOptions()) as pipeline:
>     df = pd.DataFrame(
>         np.array([
>             ['state', 'day1', 12],
>             ['state', 'day1', 1],
>             ['state', 'day2', 14],
>             ['county', 'day1', 9],
>         ]),
>         columns=['provider', 'time', 'value'])
>     # Create two index fields to satisfy Beam
>     df = df.set_index(['provider', 'time'])
>     df.to_parquet('test.parquet')
>     
>     # Should print out
>     #      value
>     # time      
>     # day1    12
>     # day1     1
>     # day2    14
>     print(df.xs('state'))
>     
>     # Dies with no clear error at
>     # 
> /opt/conda/lib/python3.9/site-packages/apache_beam/dataframe/transforms.py in 
> output_partitioning_in_stage(expr, stage)
>     # 305 
>     # 306       # Anything that's not an input must have arguments
>     # 307       assert len(expr.args())
>     # 308 
>     # 309       arg_partitionings = set(
>     test_df = (pipeline | read_parquet('test.parquet'))
>     (
>         test_df.xs('state').to_csv('test.csv')
>     ) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to