[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063300#comment-15063300
 ] 

Sun Rui commented on SPARK-12360:
---------------------------------

This JIRA was intented to address the issue of passing long value to Scala/Java 
API methods which requires one or more arguments of long type. Currently, there 
are some functions in SparkR in this case:
{code}
def sample(withReplacement: Boolean, fraction: Double, seed: Long)
def rand(seed: Long): Column
def randn(seed: Long): Column
def rangeBetween(start: Long, end: Long): WindowSpec
def rowsBetween(start: Long, end: Long): WindowSpec
{code}
Which concerns me is rangeBetween() and rowsBetween(). If a window is very 
large, then now there is no way in SparkR to specify a precise window boundary.
But it seems there is API in-consistency in window functions in using int type 
or long type, for example, lead()/lag() use int to specify row offset.

Your comment makes me think of another thing: when collecting a DataFrame, 
values of long type are converted to numeric in R. However, these may have 
precision problem if any further processing is done on such values in R. We can 
collect values of long type to integer64 if bit64 is supported.

> Support using 64-bit long type in SparkR
> ----------------------------------------
>
>                 Key: SPARK-12360
>                 URL: https://issues.apache.org/jira/browse/SPARK-12360
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR
>    Affects Versions: 1.5.2
>            Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to