[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18945
  
**[Test build #82025 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82025/testReport)**
 for PR 18945 at commit 
[`18d8d72`](https://github.com/apache/spark/commit/18d8d72d8d81e92e03ff66be15fb0bf323e3997d).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18945
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82025/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18945
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18945
  
**[Test build #82025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82025/testReport)**
 for PR 18945 at commit 
[`18d8d72`](https://github.com/apache/spark/commit/18d8d72d8d81e92e03ff66be15fb0bf323e3997d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18945
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18945
  
**[Test build #82024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82024/testReport)**
 for PR 18945 at commit 
[`18d8d72`](https://github.com/apache/spark/commit/18d8d72d8d81e92e03ff66be15fb0bf323e3997d).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18945
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82024/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18945
  
**[Test build #82024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82024/testReport)**
 for PR 18945 at commit 
[`18d8d72`](https://github.com/apache/spark/commit/18d8d72d8d81e92e03ff66be15fb0bf323e3997d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
@logannc, mind adding the JIRA number in this PR title as described in the 
guide line? Please take a look - http://spark.apache.org/contributing.html.

I'd read carefully the comments above, e.g., adding a test, 
https://github.com/apache/spark/pull/18945#issuecomment-323307343, fixing the 
PR title, https://github.com/apache/spark/pull/18945#issuecomment-323162796, 
following the suggestion , 
https://github.com/apache/spark/pull/18945#issuecomment-331008594.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18945
  
@logannc There are pandas related tests in `python/pyspark/sql/tests.py`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-20 Thread logannc
Github user logannc commented on the issue:

https://github.com/apache/spark/pull/18945
  
Hm. Where would I add tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18945
  
@HyukjinKwon I can take over this if @logannc can't find time to continue 
it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18945
  
We also need a proper test for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-20 Thread logannc
Github user logannc commented on the issue:

https://github.com/apache/spark/pull/18945
  
Sorry I feel off the face of the earth. I finally had some time to sit down 
and do this. I took your suggestions but implemented it a little differently. 
Unless I've made a dumb mistake, I think I improved on it a bit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
@BryanCutler, @a10y and @viirya, would you guys be interested in this and 
have some time to take over this with the different approach we discussed above 
-  https://github.com/apache/spark/pull/18945#issuecomment-323917328 and  
https://github.com/apache/spark/pull/18945#discussion_r134033952? I could take 
over this too if you guys are currently busy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
gentle ping @logannc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-09-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
Hey @logannc, have you had some time to work on this? I want to fix this 
issue asap. Ortherwise, would anyone here be interested in submitimg another PR 
for the another approach?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-27 Thread logannc
Github user logannc commented on the issue:

https://github.com/apache/spark/pull/18945
  
Sorry for the delay. Things got busy and now there is the storm in Houston. 
Will update this per these suggestions soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
BTW, I think it'd be nicer if we can go with the approach above ^ (checking 
the null in data and setting the correct type). I am okay with any form for the 
approach above as we have a decent Arrow optimization now for the performance 
aspect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-21 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18945
  
Thanks for clarifying @HyukjinKwon , I see what you mean now.  Since pandas 
will iterate over `self.collect()` anyway I don't think your solution would 
impact performance at all right?  So your way might be better, but it is 
slightly more complicated..

Just to sum things up - @logannc does this still meet your requirements? 
Instead of having the `strict = True` option we do the following:
```
for each nullable int32 column:
if there are null values:
change column type to float32
else:
change column type to int32
```

I'm also guessing we will have the same problem with nullable ShortType - 
maybe others? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
Yea, I think it is basically similar idea with 
https://github.com/apache/spark/pull/18945#discussion_r134033952.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
Ah, I had to be clear. I thought something like ...

```python
dtype = {}
for field in self.schema:
pandas_type = _to_corrected_pandas_type(field.dataType)
if pandas_type is not None:
dtype[field.name] = pandas_type

# Columns with int + nullable from schemaa
int_null_cols = [...]

# Columns with int + nullable but with actual None.
int_null_cols_with_none = [...]

# This functions checks if the value is None.
def check_nulls():
for row in rows:
# Check with int_null_cols and set int_null_cols_with_none if there 
is None.
yield rows

# Don't check anything if no int + nullable columns.
if len(int_null_cols) > 0:
check_func = check_nulls
else:
check_func = lambda r: r

pdf = pd.DataFrame.from_records(check_null(self.collect()), 
columns=self.columns)

# Replace int32 -> float one by checking int_null_cols.
dtype = ...

for f, t in dtype.items():
pdf[f] = pdf[f].astype(t, copy=False)
return pdf
```

So, I was thinking of checking the actual value in the data might be a way 
if we can't resolve this only with the schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-18 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18945
  
> Another rough thought for a feasible way I could think to keep the 
current behaviour (to be more specific, to match the types with / without Arrow 
optimization, IIUC) is, to make a generator wrapper to check if None exists in 
the results for columns where the type is int and nullable.

I'm not sure I follow @HyukjinKwon , we can just look at the `pdf` when it 
is first created but before any type conversions and just not do anything if it 
the column is a nullable int with null values.  Is this similar to what you're 
suggesting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
@logannc, mind adding the JIRA number in this PR title as described in the 
guide line?
I definitely also think this needs a JIRA as before and after are not 
virtually same and it looks a non-trivial change - this even could be a 
regression if SPARK-21163 was resolved in 2.2.0. I believe tests are required 
here too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18945
  
I agree with @BryanCutler in general. `None` makes more sense to me to 
infer the type in this case.

Another rough thought for a feasible way I could think to keep the current 
behaviour (to be more specific, to match the types with / without Arrow 
optimization, IIUC) is, to make a generator wrapper to check if `None` exists 
in the results for columns where the type is int and nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-17 Thread logannc
Github user logannc commented on the issue:

https://github.com/apache/spark/pull/18945
  
I read the contributing guide. It said that simple changes didn't need a 
JIRA. Certainly this code change is quite simple, I just wasn't sure if there 
would be enough discussion to warrant a Jira. Now I know.

So, rather than return np.float32, return None? That would probably also 
work, though the bug might get reintroduced by someone unfamiliar with the 
problem. That is why I prefered the explicitness of returning a type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-17 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18945
  
There should be no way an error like this is raised during the call 
`toPandas()` so I am thinking that if there is a nullable int column, the type 
should not try to be changed in `_to_corrected_pandas_type` cc @HyukjinKwon 
@ueshin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-17 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18945
  
@logannc , thanks for this.  You bring up a big issue here that I think was 
overlooked when this code was added in Spark.  I filed a JIRA for this 
[SPARK-21766](https://issues.apache.org/jira/browse/SPARK-21766), which 
generally comes before the PR.  Please see the contributing guide 
[here](http://spark.apache.org/contributing.html)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18945
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org