philipportner opened a new pull request, #2288:
URL: https://github.com/apache/systemds/pull/2288
Adds a test case with inputs that have multiple groups with varying row
counts.
This pattern comes from a `lineorder.csv` example dataset that currently
causes a runtime exception for the `permutation-matrix` approach but works for
the `nested-loop` approach.
Why this happened:
- `permutation-matrix` approach allocated space assuming every group has
`maxRowsInGroup` rows
- groups may have variable sizes resulting in `Y_temp_reduce` having fewer
rows than the reshape expects
Changes:
- correctly pads the matrix in when groups do not all have `maxRowsInGroup`
rows
- adds testcases that cover this pattern
To reproduce original crash:
`lineorder.csv`:
```
0,1,2,3,4
1.0,1.0,18238.0,155190.0,828.0
1.0,2.0,18238.0,67310.0,163.0
1.0,3.0,18238.0,63700.0,71.0
1.0,4.0,18238.0,2132.0,943.0
2.0,1.0,20612.0,106170.0,1066.0
2.0,2.0,20612.0,194509.0,602.0
2.0,3.0,20612.0,100164.0,138.0
2.0,4.0,20612.0,45803.0,1382.0
2.0,5.0,20612.0,4439.0,1684.0
3.0,1.0,13813.0,4297.0,1959.0
```
`crash.dml`:
```
path_to_lineorder = "lineorder.csv"
X = read(path_to_lineorder, format = "csv", header=TRUE, sep = ",")
source("./scripts/builtin/raGroupby.dml") as ra_new
Y = ra_new::m_raGroupby(X, 2, "nested-loop")
print(toString(Y)) # nested-loop works
Y = ra_new::m_raGroupby(X, 2, "permutation-matrix")
print(toString(Y)) # permutation-matrix breaks
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]