[jira] [Updated] (SPARK-25717) Insert overwrite a recreated external and partitioned table may result in incorrect query results

2018-10-12 Thread Jinhua Fu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-25717:
--
Description: 
Consider the following scenario:
{code:java}
spark.range(100).createTempView("temp")
(0 until 3).foreach { _ =>
  spark.sql("drop table if exists tableA")
  spark.sql("create table if not exists tableA(a int) partitioned by (p int) 
location 'file:/e:/study/warehouse/tableA'")
  spark.sql("insert overwrite table tableA partition(p=1) select * from temp")
  spark.sql("select count(1) from tableA where p=1").show
}
{code}
We expect the count always be 100, but the actual results are as follows:
{code:java}
++
|count(1)|
++
| 100|
++

++
|count(1)|
++
|   200|
++

++
|count(1)|
++
|   300|
++
{code}
when spark executes an `insert overwrite` command,  it gets the historical 
partition first, and then delete it from fileSystem.

But for recreated external and partitioned table, the partitions were all 
deleted by the `drop  table` command with data unremoved. So the historical 
data is preserved which lead to the query results incorrect.

 

  was:
Consider the following scenario:
{code:java}
spark.range(100).createTempView("temp")
(0 until 3).foreach { _ =>
  spark.sql("drop table if exists tableA")
  spark.sql("create table if not exists tableA(a int) partitioned by (p int) 
location 'file:/e:/study/warehouse/tableA'")
  spark.sql("insert overwrite table tableA partition(p=1) select * from temp")
  spark.sql("select count(1) from tableA where p=1").show
}
{code}
We expect the count always be 100, but the actual results are as follows:
{code:java}
++
|count(1)|
++
| 100|
++

++
|count(1)|
++
|   200|
++

++
|count(1)|
++
|   300|
++
{code}
when spark executes an `insert overwrite` command,  it gets the historical 
partition first, and then delete it from fileSystem.

But for recreated external and partitioned table, the partitions were all 
deleted by the `drop  table` command. So the historical data is preserved which 
lead to the query results incorrect.

 


> Insert overwrite a recreated external and partitioned table may result in 
> incorrect query results
> -
>
> Key: SPARK-25717
> URL: https://issues.apache.org/jira/browse/SPARK-25717
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Jinhua Fu
>Priority: Major
>
> Consider the following scenario:
> {code:java}
> spark.range(100).createTempView("temp")
> (0 until 3).foreach { _ =>
>   spark.sql("drop table if exists tableA")
>   spark.sql("create table if not exists tableA(a int) partitioned by (p int) 
> location 'file:/e:/study/warehouse/tableA'")
>   spark.sql("insert overwrite table tableA partition(p=1) select * from temp")
>   spark.sql("select count(1) from tableA where p=1").show
> }
> {code}
> We expect the count always be 100, but the actual results are as follows:
> {code:java}
> ++
> |count(1)|
> ++
> | 100|
> ++
> ++
> |count(1)|
> ++
> |   200|
> ++
> ++
> |count(1)|
> ++
> |   300|
> ++
> {code}
> when spark executes an `insert overwrite` command,  it gets the historical 
> partition first, and then delete it from fileSystem.
> But for recreated external and partitioned table, the partitions were all 
> deleted by the `drop  table` command with data unremoved. So the historical 
> data is preserved which lead to the query results incorrect.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25717) Insert overwrite a recreated external and partitioned table may result in incorrect query results

2018-10-12 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-25717:
-

 Summary: Insert overwrite a recreated external and partitioned 
table may result in incorrect query results
 Key: SPARK-25717
 URL: https://issues.apache.org/jira/browse/SPARK-25717
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.2
Reporter: Jinhua Fu


Consider the following scenario:
{code:java}
spark.range(100).createTempView("temp")
(0 until 3).foreach { _ =>
  spark.sql("drop table if exists tableA")
  spark.sql("create table if not exists tableA(a int) partitioned by (p int) 
location 'file:/e:/study/warehouse/tableA'")
  spark.sql("insert overwrite table tableA partition(p=1) select * from temp")
  spark.sql("select count(1) from tableA where p=1").show
}
{code}
We expect the count always be 100, but the actual results are as follows:
{code:java}
++
|count(1)|
++
| 100|
++

++
|count(1)|
++
|   200|
++

++
|count(1)|
++
|   300|
++
{code}
when spark executes an `insert overwrite` command,  it gets the historical 
partition first, and then delete it from fileSystem.

But for recreated external and partitioned table, the partitions were all 
deleted by the `drop  table` command. So the historical data is preserved which 
lead to the query results incorrect.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25701) Supports calculation of table statistics from partition's catalog statistics

2018-10-10 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-25701:
-

 Summary: Supports calculation of table statistics from partition's 
catalog statistics
 Key: SPARK-25701
 URL: https://issues.apache.org/jira/browse/SPARK-25701
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.2
Reporter: Jinhua Fu


When obtaining table statistics, if the `totalSize` of the table is not 
defined, we fallback to HDFS to get the table statistics when 
`spark.sql.statistics.fallBackToHdfs` is `true`, otherwise the default 
value(`spark.sql.defaultSizeInBytes`) will be taken.

Fortunately, in most case the data is written into the table by a insertion 
command which will save the data-size in meta data, so it's possible to use 
meta data to calculate the table statistics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25404) Staging path may not on the expected place when table path contains the stagingDir string

2018-09-11 Thread Jinhua Fu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-25404:
--
Description: 
Considering the follow scenario:

 
{code:java}
SET hive.exec.stagingdir=temp;
CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
INSERT OVERWRITE TABLE tempTableA SELECT 1;
{code}
We expect the staging path under the table path, such as 
'/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
'/spark/tempXXX'.

I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, 
but it maybe the cause of this bug.

 
{code:java}
// SaveAsHiveFile.scala
private def getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path = {
  ..
  var stagingPathName: String =
  if (inputPathName.indexOf(stagingDir) == -1) {
new Path(inputPathName, stagingDir).toString
  } else {
// The 'indexOf' may not get expected position, and this may be the cause 
of this bug.
inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
  }
  ..
}
{code}
 

 

 

  was:
Considering the follow scenario:

 
{code:java}
SET hive.exec.stagingdir=temp;
CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
INSERT OVERWRITE TABLE tempTableA SELECT 1;
{code}
We expect the staging path under the table path, such as 
'/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
'/spark/tempXXX'.

I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, 
but it maybe the cause of this bug.

 
{code:java}
// SaveAsHiveFile.scala
private def getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path = {
  ..
  var stagingPathName: String =
  if (inputPathName.indexOf(stagingDir) == -1) {
new Path(inputPathName, stagingDir).toString
  } else {
// The 'indexOf' may get expect position, and this may be the cause of this 
bug.
inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
  }
  ..
}
{code}
 

 

 


> Staging path may not on the expected place when table path contains the 
> stagingDir string
> -
>
> Key: SPARK-25404
> URL: https://issues.apache.org/jira/browse/SPARK-25404
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Jinhua Fu
>Priority: Minor
>
> Considering the follow scenario:
>  
> {code:java}
> SET hive.exec.stagingdir=temp;
> CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
> INSERT OVERWRITE TABLE tempTableA SELECT 1;
> {code}
> We expect the staging path under the table path, such as 
> '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
> '/spark/tempXXX'.
> I'm not quite sure why we use the 'if ... else ...' when getting a 
> stagingDir, but it maybe the cause of this bug.
>  
> {code:java}
> // SaveAsHiveFile.scala
> private def getStagingDir(
> inputPath: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
>   ..
>   var stagingPathName: String =
>   if (inputPathName.indexOf(stagingDir) == -1) {
> new Path(inputPathName, stagingDir).toString
>   } else {
> // The 'indexOf' may not get expected position, and this may be the cause 
> of this bug.
> inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
> stagingDir.length)
>   }
>   ..
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25404) Staging path may not on the expected place when table path contains the stagingDir string

2018-09-11 Thread Jinhua Fu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-25404:
--
Description: 
Considering the follow scenario:

 
{code:java}
SET hive.exec.stagingdir=temp;
CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
INSERT OVERWRITE TABLE tempTableA SELECT 1;
{code}
We expect the staging path under the table path, such as 
'/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
'/spark/tempXXX'.

I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, 
but it maybe the cause of this bug.

 
{code:java}
// SaveAsHiveFile.scala
private def getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path = {
  ..
  var stagingPathName: String =
  if (inputPathName.indexOf(stagingDir) == -1) {
new Path(inputPathName, stagingDir).toString
  } else {
// The 'indexOf' may get expect position, and this may be the cause of this 
bug.
inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
  }
  ..
}
{code}
 

 

 

  was:
Considering the follow scenario: 

 
{code:java}
SET hive.exec.stagingdir=temp;
CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
INSERT OVERWRITE TABLE tempTableA SELECT 1;
{code}
We expect the staging path under the table path, such as 
'/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
'/spark/tempXXX'.

I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, 
but it maybe the cause of this bug.

 
{code:java}
private def getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path = {
  ..
  var stagingPathName: String =
  if (inputPathName.indexOf(stagingDir) == -1) {
new Path(inputPathName, stagingDir).toString
  } else {
inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
  }
  ..
}
{code}
 

 

 


> Staging path may not on the expected place when table path contains the 
> stagingDir string
> -
>
> Key: SPARK-25404
> URL: https://issues.apache.org/jira/browse/SPARK-25404
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Jinhua Fu
>Priority: Minor
>
> Considering the follow scenario:
>  
> {code:java}
> SET hive.exec.stagingdir=temp;
> CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
> INSERT OVERWRITE TABLE tempTableA SELECT 1;
> {code}
> We expect the staging path under the table path, such as 
> '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
> '/spark/tempXXX'.
> I'm not quite sure why we use the 'if ... else ...' when getting a 
> stagingDir, but it maybe the cause of this bug.
>  
> {code:java}
> // SaveAsHiveFile.scala
> private def getStagingDir(
> inputPath: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
>   ..
>   var stagingPathName: String =
>   if (inputPathName.indexOf(stagingDir) == -1) {
> new Path(inputPathName, stagingDir).toString
>   } else {
> // The 'indexOf' may get expect position, and this may be the cause of 
> this bug.
> inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
> stagingDir.length)
>   }
>   ..
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25404) Staging path may not on the expected place when table path contains the stagingDir string

2018-09-11 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-25404:
-

 Summary: Staging path may not on the expected place when table 
path contains the stagingDir string
 Key: SPARK-25404
 URL: https://issues.apache.org/jira/browse/SPARK-25404
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: Jinhua Fu


Considering the follow scenario: 

 
{code:java}
SET hive.exec.stagingdir=temp;
CREATE TABLE tempTableA(key int)  location '/spark/temp/tempTableA';
INSERT OVERWRITE TABLE tempTableA SELECT 1;
{code}
We expect the staging path under the table path, such as 
'/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is 
'/spark/tempXXX'.

I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, 
but it maybe the cause of this bug.

 
{code:java}
private def getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path = {
  ..
  var stagingPathName: String =
  if (inputPathName.indexOf(stagingDir) == -1) {
new Path(inputPathName, stagingDir).toString
  } else {
inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
  }
  ..
}
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20758) Add Constant propagation optimization

2018-05-17 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20758:
--
Issue Type: New JIRA Project  (was: Improvement)

> Add Constant propagation optimization
> -
>
> Key: SPARK-20758
> URL: https://issues.apache.org/jira/browse/SPARK-20758
> Project: Spark
>  Issue Type: New JIRA Project
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Minor
> Fix For: 2.3.0
>
>
> Constant propagation involves substituting attributes which can be statically 
> evaluated in expressions. Its a pretty common optimization in compilers world.
> eg.
> {noformat}
> SELECT * FROM table WHERE i = 5 AND j = i + 3
> {noformat}
> can be re-written as:
> {noformat}
> SELECT * FROM table WHERE i = 5 AND j = 8
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

2017-08-18 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-21786:
-

 Summary: The 'spark.sql.parquet.compression.codec' configuration 
doesn't take effect on tables with partition field(s)
 Key: SPARK-21786
 URL: https://issues.apache.org/jira/browse/SPARK-21786
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Jinhua Fu


For tables created like below,  'spark.sql.parquet.compression.codec' doesn't 
take any effect when insert data. And because the default compression codec is 
'uncompressed', if I want to change the compression codec, I have to change it 
by 'set parquet.compression='.

Contrast,tables without any partition field will work normal with 
'spark.sql.parquet.compression.codec',and the default compression codec is 
'snappy', but it seems 'parquet.compression' no longer in effect.

Should we use the ‘spark.sql.parquet.compression.codec’ configuration uniformly?


CREATE TABLE Test_Parquet(provincecode int, citycode int, districtcode int)
PARTITIONED BY (p_provincecode int)
STORED AS PARQUET;

INSERT OVERWRITE TABLE Test_Parquet select * from TableB;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21135) On history server page,duration of incompleted applications should be hidden instead of showing up as 0

2017-06-19 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-21135:
-

 Summary: On history server page,duration of incompleted 
applications should be hidden instead of showing up as 0
 Key: SPARK-21135
 URL: https://issues.apache.org/jira/browse/SPARK-21135
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.2.1
Reporter: Jinhua Fu
Priority: Minor


On history server page,duration of incompleted applications should be hidden 
instead of showing up as 0.
In addition, the application of an exception abort (such as the application of 
a background kill or driver outage) will always be treated as a Incompleted 
application, and I'm not sure if this is a problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination

2017-06-08 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-21018:
--
Description: 
When using Thriftsever, the number of jobs and Stages may be very large, and if 
not paginated, the page will be very long and slow to load, especially when 
spark.ui.retainedJobs is set to a large value. So I suggest "completed Jobs" 
and "completed Stages" support pagination.

I'd like to change them to a paging display similar to the tasks in the 
"Details for Stage" page.

  was:When using Thriftsever, the number of jobs and Stages may be very large, 
and if not paginated, the page will be very long and slow to load, especially 
when spark.ui.retainedJobs is set to a large value. So I suggest "completed 
Jobs" and "completed Stages" support pagination.


> "Completed Jobs" and "Completed Stages" support pagination
> --
>
> Key: SPARK-21018
> URL: https://issues.apache.org/jira/browse/SPARK-21018
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: CompletedJobs.png, PagedTasks.png
>
>
> When using Thriftsever, the number of jobs and Stages may be very large, and 
> if not paginated, the page will be very long and slow to load, especially 
> when spark.ui.retainedJobs is set to a large value. So I suggest "completed 
> Jobs" and "completed Stages" support pagination.
> I'd like to change them to a paging display similar to the tasks in the 
> "Details for Stage" page.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination

2017-06-08 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-21018:
--
Attachment: PagedTasks.png

> "Completed Jobs" and "Completed Stages" support pagination
> --
>
> Key: SPARK-21018
> URL: https://issues.apache.org/jira/browse/SPARK-21018
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: CompletedJobs.png, PagedTasks.png
>
>
> When using Thriftsever, the number of jobs and Stages may be very large, and 
> if not paginated, the page will be very long and slow to load, especially 
> when spark.ui.retainedJobs is set to a large value. So I suggest "completed 
> Jobs" and "completed Stages" support pagination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination

2017-06-08 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-21018:
--
Attachment: CompletedJobs.png

> "Completed Jobs" and "Completed Stages" support pagination
> --
>
> Key: SPARK-21018
> URL: https://issues.apache.org/jira/browse/SPARK-21018
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: CompletedJobs.png
>
>
> When using Thriftsever, the number of jobs and Stages may be very large, and 
> if not paginated, the page will be very long and slow to load, especially 
> when spark.ui.retainedJobs is set to a large value. So I suggest "completed 
> Jobs" and "completed Stages" support pagination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination

2017-06-08 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-21018:
-

 Summary: "Completed Jobs" and "Completed Stages" support pagination
 Key: SPARK-21018
 URL: https://issues.apache.org/jira/browse/SPARK-21018
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.0.2
Reporter: Jinhua Fu
Priority: Minor


When using Thriftsever, the number of jobs and Stages may be very large, and if 
not paginated, the page will be very long and slow to load, especially when 
spark.ui.retainedJobs is set to a large value. So I suggest "completed Jobs" 
and "completed Stages" support pagination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-04 Thread Jinhua Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996681#comment-15996681
 ] 

Jinhua Fu commented on SPARK-20591:
---

Does it need modify and may I take this PR?

> Succeeded tasks num not equal in job page and job detail page on spark web ui 
> when speculative task(s) exist
> 
>
> Key: SPARK-20591
> URL: https://issues.apache.org/jira/browse/SPARK-20591
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: job detail page(stages).png, job page.png
>
>
> when spark.speculation is enabled,and there are some speculative tasks, then 
> we can see succeeded tasks num include speculative tasks on the job page, 
> which however not being included on the job detail page(job stages page).
> When I consider some tasks may run a little slow by the job page's  succeeded 
> tasks more than total tasks,which make me want to known which tasks and why,I 
> have to check every stage to find the speculative tasks which is beacause 
> speculative tasks not being included in the stage succeeded task num.
> Can it be improved?
> update two screenshots, succeeded task num is 557 on job page,but 550(by sum) 
> on job detail page(stages),the extra 7 tasks are speculative tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-04 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20591:
--
Description: 
when spark.speculation is enabled,and there are some speculative tasks, then we 
can see succeeded tasks num include speculative tasks on the job page, which 
however not being included on the job detail page(job stages page).
When I consider some tasks may run a little slow by the job page's  succeeded 
tasks more than total tasks,which make me want to known which tasks and why,I 
have to check every stage to find the speculative tasks which is beacause 
speculative tasks not being included in the stage succeeded task num.
Can it be improved?

update two screenshots, succeeded task num is 557 on job page,but 550(by sum) 
on job detail page(stages),the extra 7 tasks are speculative tasks. 

  was:
when spark.speculation is enabled,and there are some speculative tasks, then we 
can see succeeded tasks num include speculative tasks on the job page, which 
however not being included on the job detail page(job stages page).
When I consider some tasks may run a little slow by the job page's  succeeded 
tasks more than total tasks,which make me want to known which tasks and why,I 
have to check every stage to find the speculative tasks which is beacause 
speculative tasks not being included in the stage succeeded task num.
Can it be improved?


> Succeeded tasks num not equal in job page and job detail page on spark web ui 
> when speculative task(s) exist
> 
>
> Key: SPARK-20591
> URL: https://issues.apache.org/jira/browse/SPARK-20591
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: job detail page(stages).png, job page.png
>
>
> when spark.speculation is enabled,and there are some speculative tasks, then 
> we can see succeeded tasks num include speculative tasks on the job page, 
> which however not being included on the job detail page(job stages page).
> When I consider some tasks may run a little slow by the job page's  succeeded 
> tasks more than total tasks,which make me want to known which tasks and why,I 
> have to check every stage to find the speculative tasks which is beacause 
> speculative tasks not being included in the stage succeeded task num.
> Can it be improved?
> update two screenshots, succeeded task num is 557 on job page,but 550(by sum) 
> on job detail page(stages),the extra 7 tasks are speculative tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-04 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20591:
--
Attachment: job detail page(stages).png

> Succeeded tasks num not equal in job page and job detail page on spark web ui 
> when speculative task(s) exist
> 
>
> Key: SPARK-20591
> URL: https://issues.apache.org/jira/browse/SPARK-20591
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: job detail page(stages).png, job page.png
>
>
> when spark.speculation is enabled,and there are some speculative tasks, then 
> we can see succeeded tasks num include speculative tasks on the job page, 
> which however not being included on the job detail page(job stages page).
> When I consider some tasks may run a little slow by the job page's  succeeded 
> tasks more than total tasks,which make me want to known which tasks and why,I 
> have to check every stage to find the speculative tasks which is beacause 
> speculative tasks not being included in the stage succeeded task num.
> Can it be improved?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-04 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20591:
--
Attachment: (was: screenshot-1.png)

> Succeeded tasks num not equal in job page and job detail page on spark web ui 
> when speculative task(s) exist
> 
>
> Key: SPARK-20591
> URL: https://issues.apache.org/jira/browse/SPARK-20591
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: job detail page(stages).png, job page.png
>
>
> when spark.speculation is enabled,and there are some speculative tasks, then 
> we can see succeeded tasks num include speculative tasks on the job page, 
> which however not being included on the job detail page(job stages page).
> When I consider some tasks may run a little slow by the job page's  succeeded 
> tasks more than total tasks,which make me want to known which tasks and why,I 
> have to check every stage to find the speculative tasks which is beacause 
> speculative tasks not being included in the stage succeeded task num.
> Can it be improved?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-04 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20591:
--
Attachment: screenshot-1.png

> Succeeded tasks num not equal in job page and job detail page on spark web ui 
> when speculative task(s) exist
> 
>
> Key: SPARK-20591
> URL: https://issues.apache.org/jira/browse/SPARK-20591
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: job page.png, screenshot-1.png
>
>
> when spark.speculation is enabled,and there are some speculative tasks, then 
> we can see succeeded tasks num include speculative tasks on the job page, 
> which however not being included on the job detail page(job stages page).
> When I consider some tasks may run a little slow by the job page's  succeeded 
> tasks more than total tasks,which make me want to known which tasks and why,I 
> have to check every stage to find the speculative tasks which is beacause 
> speculative tasks not being included in the stage succeeded task num.
> Can it be improved?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-04 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20591:
--
Attachment: job page.png

> Succeeded tasks num not equal in job page and job detail page on spark web ui 
> when speculative task(s) exist
> 
>
> Key: SPARK-20591
> URL: https://issues.apache.org/jira/browse/SPARK-20591
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>Priority: Minor
> Attachments: job page.png
>
>
> when spark.speculation is enabled,and there are some speculative tasks, then 
> we can see succeeded tasks num include speculative tasks on the job page, 
> which however not being included on the job detail page(job stages page).
> When I consider some tasks may run a little slow by the job page's  succeeded 
> tasks more than total tasks,which make me want to known which tasks and why,I 
> have to check every stage to find the speculative tasks which is beacause 
> speculative tasks not being included in the stage succeeded task num.
> Can it be improved?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist

2017-05-03 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-20591:
-

 Summary: Succeeded tasks num not equal in job page and job detail 
page on spark web ui when speculative task(s) exist
 Key: SPARK-20591
 URL: https://issues.apache.org/jira/browse/SPARK-20591
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.0.2
Reporter: Jinhua Fu


when spark.speculation is enabled,and there are some speculative tasks, then we 
can see succeeded tasks num include speculative tasks on the job page, which 
however not being included on the job detail page(job stages page).
When I consider some tasks may run a little slow by the job page's  succeeded 
tasks more than total tasks,which make me want to known which tasks and why,I 
have to check every stage to find the speculative tasks which is beacause 
speculative tasks not being included in the stage succeeded task num.
Can it be improved?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20150) Add permsize statistics for worker memory which may be very useful for the memory usage assessment

2017-03-29 Thread Jinhua Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhua Fu updated SPARK-20150:
--
Summary: Add permsize statistics for worker memory which may be very useful 
for the memory usage assessment  (was: Can the spark add a mechanism for 
permsize statistics which may be very useful for the memory usage assessment)

> Add permsize statistics for worker memory which may be very useful for the 
> memory usage assessment
> --
>
> Key: SPARK-20150
> URL: https://issues.apache.org/jira/browse/SPARK-20150
> Project: Spark
>  Issue Type: Wish
>  Components: Web UI
>Affects Versions: 2.0.2
>Reporter: Jinhua Fu
>
> It seems worker memory only be assigned to executor heap which is usually not 
> enough for estimating the whole clauster memory usage,especially when memory 
> becomes a bottleneck of the clauster.In many case,we found a executor's real 
> memory usage was much larger than its heap size which make me have to check 
> for every application's real memory expenditure.
> This can be improved by adding a mechanism for Non-Heap(permsize) 
> statistics,only shown for extra memory usage which has no effect on the 
> current worker memory allocation and statistics.The permsize can be obtained 
> easily from executor java options.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20150) Can the spark add a mechanism for permsize statistics which may be very useful for the memory usage assessment

2017-03-29 Thread Jinhua Fu (JIRA)
Jinhua Fu created SPARK-20150:
-

 Summary: Can the spark add a mechanism for permsize statistics 
which may be very useful for the memory usage assessment
 Key: SPARK-20150
 URL: https://issues.apache.org/jira/browse/SPARK-20150
 Project: Spark
  Issue Type: Wish
  Components: Web UI
Affects Versions: 2.0.2
Reporter: Jinhua Fu


It seems worker memory only be assigned to executor heap which is usually not 
enough for estimating the whole clauster memory usage,especially when memory 
becomes a bottleneck of the clauster.In many case,we found a executor's real 
memory usage was much larger than its heap size which make me have to check for 
every application's real memory expenditure.

This can be improved by adding a mechanism for Non-Heap(permsize) 
statistics,only shown for extra memory usage which has no effect on the current 
worker memory allocation and statistics.The permsize can be obtained easily 
from executor java options.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20120) spark-sql CLI support silent mode

2017-03-28 Thread Jinhua Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944635#comment-15944635
 ] 

Jinhua Fu edited comment on SPARK-20120 at 3/28/17 6:51 AM:


Good idea.I agree with you!
The "-S" option seems not  effective.


was (Author: jinhua fu):
Good idea.I agree with you!

> spark-sql CLI support silent mode
> -
>
> Key: SPARK-20120
> URL: https://issues.apache.org/jira/browse/SPARK-20120
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>
> It is similar to Hive silent mode, just show the query result. see:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20120) spark-sql CLI support silent mode

2017-03-28 Thread Jinhua Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944635#comment-15944635
 ] 

Jinhua Fu commented on SPARK-20120:
---

Good idea.I agree with you!

> spark-sql CLI support silent mode
> -
>
> Key: SPARK-20120
> URL: https://issues.apache.org/jira/browse/SPARK-20120
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>
> It is similar to Hive silent mode, just show the query result. see:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org