Chunxi Zhang created SPARK-6931:
-----------------------------------

             Summary: python: struct.pack('!q', value) in write_long(value, 
stream) in serializers.py require int(but doesn't raise exceptions in common 
cases)
                 Key: SPARK-6931
                 URL: https://issues.apache.org/jira/browse/SPARK-6931
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.3.0
            Reporter: Chunxi Zhang
            Priority: Critical


when I map my own feature calculation module's function, sparks raises:
Traceback (most recent call last):
  File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", 
line 162, in manager
    code = worker(sock)
  File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", 
line 60, in worker
    worker_main(infile, outfile)
  File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", 
line 115, in main
    report_times(outfile, boot_time, init_time, finish_time)
  File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", 
line 40, in report_times
    write_long(1000 * boot, outfile)
  File 
"/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/serializers.py", 
line 518, in write_long
    stream.write(struct.pack("!q", value))
DeprecationWarning: integer argument expected, got float

so I turn on the serializers.py, and tried to print the value out, which is a 
float, came from 1000 * time.time()

when I removed my lib, or add a rdd.count() before mapping my lib, this bug 
won’t appear.

so I edited the function to :

def write_long(value, stream):
    stream.write(struct.pack("!q", int(value))) # added a int(value)

everything seem fine…

According to python’s doc for 
struct(https://docs.python.org/2/library/struct.html)’s Note(3), the value 
should be a int(for q), and if it’s a float, it’ll try use __index__(), else, 
try __int__, but since __int__ is deprecated, it’ll raise DeprecationWarning. 
And float doesn’t have __index__, but has __int__, so it should raise the 
exception every time.

But, as you can see, in normal cases, it won’t raise the exception, and the 
code works perfectly, and exec struct.pack('!q', 111.1) in console or a clean 
file won't raise any exception…I can hardly tell how my lib might effect a 
time.time()'s value passed to struct.pack()... it might a python's original bug 
or what.

Anyway, this value should be a int, so add a int() to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to