Artur Sukhenko created SPARK-20244: -------------------------------------- Summary: Incorrect input size in UI with pyspark Key: SPARK-20244 URL: https://issues.apache.org/jira/browse/SPARK-20244 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.1.0, 2.0.0 Reporter: Artur Sukhenko Priority: Minor
In Spark UI (Details for Stage) Input Size is 64.0 KB when running in PySparkShell. Also it is incorrect in Tasks table: 64.0 KB / 132120575 in pyspark 252.0 MB / 132120575 in spark-shell I will attach screenshots. Reproduce steps: Run this to generate big file (press Ctrl+C after 5-6 seconds) $ yes > /tmp/yes.txt $ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ $ ./bin/pyspark {code} Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Python version 2.7.5 (default, Nov 6 2016 00:28:07) SparkSession available as 'spark'.{code} >>> a = sc.textFile("/tmp/yes.txt") >>> a.count() Open Spark UI and check Stage 0. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org