[jira] [Issue Comment Deleted] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export
[ https://issues.apache.org/jira/browse/SPARK-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7148: --- Comment: was deleted (was: [~josephkb] If you are busy with other issues, please don't hesitate to assign it to me.) Configure Parquet block size (row group size) for ML model import/export Key: SPARK-7148 URL: https://issues.apache.org/jira/browse/SPARK-7148 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Joseph K. Bradley Priority: Minor It would be nice if we could configure the Parquet buffer size when using Parquet format for ML model import/export. Currently, for some models (trees and ensembles), the schema has 13+ columns. With a default buffer size of 128MB (I think), that puts the allocated buffer way over the default memory made available by run-example. Because of this problem, users have to use spark-submit and explicitly use a larger amount of memory in order to run some ML examples. Is there a simple way to specify {{parquet.block.size}}? I'm not familiar with this part of SparkSQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export
[ https://issues.apache.org/jira/browse/SPARK-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7148: --- Comment: was deleted (was: [~josephkb] If you are busy with other issues, please don't hesitate to assign it to me.) Configure Parquet block size (row group size) for ML model import/export Key: SPARK-7148 URL: https://issues.apache.org/jira/browse/SPARK-7148 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Joseph K. Bradley Priority: Minor It would be nice if we could configure the Parquet buffer size when using Parquet format for ML model import/export. Currently, for some models (trees and ensembles), the schema has 13+ columns. With a default buffer size of 128MB (I think), that puts the allocated buffer way over the default memory made available by run-example. Because of this problem, users have to use spark-submit and explicitly use a larger amount of memory in order to run some ML examples. Is there a simple way to specify {{parquet.block.size}}? I'm not familiar with this part of SparkSQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export
[ https://issues.apache.org/jira/browse/SPARK-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7148: --- Comment: was deleted (was: [~josephkb] If you are busy with other issues, please don't hesitate to assign it to me.) Configure Parquet block size (row group size) for ML model import/export Key: SPARK-7148 URL: https://issues.apache.org/jira/browse/SPARK-7148 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Joseph K. Bradley Priority: Minor It would be nice if we could configure the Parquet buffer size when using Parquet format for ML model import/export. Currently, for some models (trees and ensembles), the schema has 13+ columns. With a default buffer size of 128MB (I think), that puts the allocated buffer way over the default memory made available by run-example. Because of this problem, users have to use spark-submit and explicitly use a larger amount of memory in order to run some ML examples. Is there a simple way to specify {{parquet.block.size}}? I'm not familiar with this part of SparkSQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export
[ https://issues.apache.org/jira/browse/SPARK-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7148: --- Comment: was deleted (was: [~josephkb] If you are busy with other issues, please don't hesitate to assign it to me.) Configure Parquet block size (row group size) for ML model import/export Key: SPARK-7148 URL: https://issues.apache.org/jira/browse/SPARK-7148 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Joseph K. Bradley Priority: Minor It would be nice if we could configure the Parquet buffer size when using Parquet format for ML model import/export. Currently, for some models (trees and ensembles), the schema has 13+ columns. With a default buffer size of 128MB (I think), that puts the allocated buffer way over the default memory made available by run-example. Because of this problem, users have to use spark-submit and explicitly use a larger amount of memory in order to run some ML examples. Is there a simple way to specify {{parquet.block.size}}? I'm not familiar with this part of SparkSQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export
[ https://issues.apache.org/jira/browse/SPARK-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7148: --- Comment: was deleted (was: [~josephkb] If you are busy with other issues, please don't hesitate to assign it to me.) Configure Parquet block size (row group size) for ML model import/export Key: SPARK-7148 URL: https://issues.apache.org/jira/browse/SPARK-7148 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Joseph K. Bradley Priority: Minor It would be nice if we could configure the Parquet buffer size when using Parquet format for ML model import/export. Currently, for some models (trees and ensembles), the schema has 13+ columns. With a default buffer size of 128MB (I think), that puts the allocated buffer way over the default memory made available by run-example. Because of this problem, users have to use spark-submit and explicitly use a larger amount of memory in order to run some ML examples. Is there a simple way to specify {{parquet.block.size}}? I'm not familiar with this part of SparkSQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org