[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2016-01-25 Thread Dmitriy Selivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115376#comment-15115376
 ] 

Dmitriy Selivanov commented on SPARK-12360:
---

+1 for bit64

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-20 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065989#comment-15065989
 ] 

Sun Rui commented on SPARK-12360:
-

The will be NumberFormatExeceptions for invalid format, which will be caught 
and sent back to R side.
{code}
scala> "123.23".toLong
java.lang.NumberFormatException: For input string: "123.23"
{code}


> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065207#comment-15065207
 ] 

Felix Cheung commented on SPARK-12360:
--

+1 string.
How would parse error (eg. should be int but getting "123.23") communicated 
back to R though?

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-18 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065263#comment-15065263
 ] 

Shivaram Venkataraman commented on SPARK-12360:
---

Yeah we can return a integer error code and check it on the R side or something 
like that. (or optionally add a checkLong function in Scala ?)

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063352#comment-15063352
 ] 

Shivaram Venkataraman commented on SPARK-12360:
---

For all the random seed cases we should just accept integers on the R side and 
then convert them to long on scala (through a util function ?). This is partly 
because set.seed in R takes an integer and so I assume R users are fine with 
integer seeds

For the window spec functions its a big more tricky - I guess the start and end 
refer to row indices here ? If so I see the concern. But I still am not sure 
its a good reason to add a whole new dependency etc. A simple workaround would 
be to support rangeBetween, rowsBetween which also take Strings as arguments 
and then to a `.toLong` on the Scala side. So in case somebody wants to use 
something bigger than an integer, they can try to use that ?

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-17 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063365#comment-15063365
 ] 

Sun Rui commented on SPARK-12360:
-

yes, use string to represent a long value, and parse it in JVM is an 
interesting idea. +1 for this. See if there is any more comment

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-17 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063300#comment-15063300
 ] 

Sun Rui commented on SPARK-12360:
-

This JIRA was intented to address the issue of passing long value to Scala/Java 
API methods which requires one or more arguments of long type. Currently, there 
are some functions in SparkR in this case:
{code}
def sample(withReplacement: Boolean, fraction: Double, seed: Long)
def rand(seed: Long): Column
def randn(seed: Long): Column
def rangeBetween(start: Long, end: Long): WindowSpec
def rowsBetween(start: Long, end: Long): WindowSpec
{code}
Which concerns me is rangeBetween() and rowsBetween(). If a window is very 
large, then now there is no way in SparkR to specify a precise window boundary.
But it seems there is API in-consistency in window functions in using int type 
or long type, for example, lead()/lag() use int to specify row offset.

Your comment makes me think of another thing: when collecting a DataFrame, 
values of long type are converted to numeric in R. However, these may have 
precision problem if any further processing is done on such values in R. We can 
collect values of long type to integer64 if bit64 is supported.

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-16 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060381#comment-15060381
 ] 

Shivaram Venkataraman commented on SPARK-12360:
---

The lack of 64 bit numbers is a limitation in R, but I'd like to understand the 
use-cases where this comes up before trying a complex fix. My understanding is 
that long values from JSON / HDFS / Parquet etc. will be read correctly because 
they go through the Scala layers and the problem only comes up when somebody 
does a collect / UDF ? If so I think the problem may not be that important as R 
users probably wouldn't expect long types to work on the R shell. 

Also it might lead to another solution where we don't add a dependency on 
bit64, but we check if bit64 is available and if so we avoid the truncation to 
double etc.

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org