[jira] [Updated] (SPARK-44264) DeepSpeed Distrobutor
[ https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-44264: - Attachment: Trying to Run Deepspeed Funcs.html > DeepSpeed Distrobutor > - > > Key: SPARK-44264 > URL: https://issues.apache.org/jira/browse/SPARK-44264 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.4.1 >Reporter: Lu Wang >Priority: Critical > Fix For: 3.5.0 > > Attachments: Trying to Run Deepspeed Funcs.html > > > To make it easier for Pyspark users to run distributed training and inference > with DeepSpeed on spark clusters using PySpark. This was a project determined > by the Databricks ML Training Team. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42103) Add Instrumentation
[ https://issues.apache.org/jira/browse/SPARK-42103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-42103. -- Resolution: Not A Problem > Add Instrumentation > --- > > Key: SPARK-42103 > URL: https://issues.apache.org/jira/browse/SPARK-42103 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Adding instrumentation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41590) Implement Baseline API Code
[ https://issues.apache.org/jira/browse/SPARK-41590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-41590. -- Resolution: Fixed > Implement Baseline API Code > --- > > Key: SPARK-41590 > URL: https://issues.apache.org/jira/browse/SPARK-41590 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Creating a baseline API so that we can agree on how the users will interact > with the code. This was determined in this [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and can be updated as necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address General Fixes
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Description: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. We want to add a check to see if `import torch` doesn't raise an ImportError since the TorchDistributor requires torch. If it raises an ImportError, we will give the user more details. was:We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. > Address General Fixes > - > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. > We want to add a check to see if `import torch` doesn't raise an ImportError > since the TorchDistributor requires torch. If it raises an ImportError, we > will give the user more details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address General Fizes
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Summary: Address General Fizes (was: Address `spark.task.resource.gpu.amount > 1`) > Address General Fizes > - > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41776) Implement support for PyTorch Lightning
[ https://issues.apache.org/jira/browse/SPARK-41776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-41776. -- Resolution: Fixed Not needed, since we are now using `torch.distributed.run` > Implement support for PyTorch Lightning > --- > > Key: SPARK-41776 > URL: https://issues.apache.org/jira/browse/SPARK-41776 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This requires us to just call train() on each spark task separately without > much preprocessing or postprocessing because PyTorch Lightning handles that > by itself. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address `spark.task.resource.gpu.amount > 1`
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Description: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. (was: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1 + address formatting comments on https://github.com/apache/spark/pull/39188#discussion_r1068903058) > Address `spark.task.resource.gpu.amount > 1` > > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address General Fixes
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Summary: Address General Fixes (was: Address General Fizes) > Address General Fixes > - > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address `spark.task.resource.gpu.amount > 1`
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Description: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1 + address formatting comments on https://github.com/apache/spark/pull/39188#discussion_r1068903058 (was: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1.) > Address `spark.task.resource.gpu.amount > 1` > > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1 + address formatting comments on > https://github.com/apache/spark/pull/39188#discussion_r1068903058 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41776) Implement support for PyTorch Lightning
[ https://issues.apache.org/jira/browse/SPARK-41776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41776: - Description: This requires us to just call train() on each spark task separately without much preprocessing or postprocessing because PyTorch Lightning handles that by itself. Update: This was resolved by using `torch.distributed.run` was:This requires us to just call train() on each spark task separately without much preprocessing or postprocessing because PyTorch Lightning handles that by itself. > Implement support for PyTorch Lightning > --- > > Key: SPARK-41776 > URL: https://issues.apache.org/jira/browse/SPARK-41776 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This requires us to just call train() on each spark task separately without > much preprocessing or postprocessing because PyTorch Lightning handles that > by itself. > > Update: This was resolved by using `torch.distributed.run` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41776) Implement support for PyTorch Lightning
[ https://issues.apache.org/jira/browse/SPARK-41776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41776: - Description: This requires us to just call train() on each spark task separately without much preprocessing or postprocessing because PyTorch Lightning handles that by itself. (was: This requires us to just call train() on each spark task separately without much preprocessing or postprocessing because PyTorch Lightning handles that by itself. Update: This was resolved by using `torch.distributed.run`) > Implement support for PyTorch Lightning > --- > > Key: SPARK-41776 > URL: https://issues.apache.org/jira/browse/SPARK-41776 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This requires us to just call train() on each spark task separately without > much preprocessing or postprocessing because PyTorch Lightning handles that > by itself. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41915) Change API so that the user doesn't have to explicitly set pytorch-lightning
[ https://issues.apache.org/jira/browse/SPARK-41915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-41915. -- Resolution: Fixed This is already resolved within https://issues.apache.org/jira/browse/SPARK-41590. > Change API so that the user doesn't have to explicitly set pytorch-lightning > > > Key: SPARK-41915 > URL: https://issues.apache.org/jira/browse/SPARK-41915 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Removing the `framework` parameter in the API and have cloudpickle > automatically find out whether the user code has a dependency on PyTorch > Lightning. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42103) Add Instrumentation
Rithwik Ediga Lakhamsani created SPARK-42103: Summary: Add Instrumentation Key: SPARK-42103 URL: https://issues.apache.org/jira/browse/SPARK-42103 Project: Spark Issue Type: Sub-task Components: ML, PySpark Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani Adding instrumentation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Description: Sidenote: make formatting updates described in https://github.com/apache/spark/pull/39188 Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output through `.collect()` was: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output through `.collect()` > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41916) Address `spark.task.resource.gpu.amount > 1`
Rithwik Ediga Lakhamsani created SPARK-41916: Summary: Address `spark.task.resource.gpu.amount > 1` Key: SPARK-41916 URL: https://issues.apache.org/jira/browse/SPARK-41916 Project: Spark Issue Type: Sub-task Components: ML, PySpark Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41915) Change API so that the user doesn't have to explicitly set pytorch-lightning
Rithwik Ediga Lakhamsani created SPARK-41915: Summary: Change API so that the user doesn't have to explicitly set pytorch-lightning Key: SPARK-41915 URL: https://issues.apache.org/jira/browse/SPARK-41915 Project: Spark Issue Type: Sub-task Components: ML, PySpark Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani Removing the `framework` parameter in the API and have cloudpickle automatically find out whether the user code has a dependency on PyTorch Lightning. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Description: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output through `.collect()` was: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Description: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output was: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like ``` import cloudpickle import os if _{_}name{_}_ == "_{_}main{_}_": train, args = cloudpickle.load(f"\{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"\{tempdir}/train_output.pkl") ``` 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Description: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output was: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Description: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like ``` import cloudpickle import os if _{_}name{_}_ == "_{_}main{_}_": train, args = cloudpickle.load(f"\{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"\{tempdir}/train_output.pkl") ``` 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output was: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like ```python import cloudpickle import os if __name__ == "__main__": train, args = cloudpickle.load(f"\{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"\{tempdir}/train_output.pkl") ``` 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > ``` > import cloudpickle > import os > if _{_}name{_}_ == "_{_}main{_}_": > train, args = cloudpickle.load(f"\{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"\{tempdir}/train_output.pkl") > ``` > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41777) Add Integration Tests
Rithwik Ediga Lakhamsani created SPARK-41777: Summary: Add Integration Tests Key: SPARK-41777 URL: https://issues.apache.org/jira/browse/SPARK-41777 Project: Spark Issue Type: Sub-task Components: ML, PySpark Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani This requires us to add PyTorch as a testing dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41776) Implement support for PyTorch Lightning
Rithwik Ediga Lakhamsani created SPARK-41776: Summary: Implement support for PyTorch Lightning Key: SPARK-41776 URL: https://issues.apache.org/jira/browse/SPARK-41776 Project: Spark Issue Type: Sub-task Components: ML, PySpark Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani This requires us to just call train() on each spark task separately without much preprocessing or postprocessing because PyTorch Lightning handles that by itself. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Component/s: ML > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > ```python > import cloudpickle > import os > if __name__ == "__main__": > train, args = cloudpickle.load(f"\{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"\{tempdir}/train_output.pkl") > ``` > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41775) Implement training functions as input
Rithwik Ediga Lakhamsani created SPARK-41775: Summary: Implement training functions as input Key: SPARK-41775 URL: https://issues.apache.org/jira/browse/SPARK-41775 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like ```python import cloudpickle import os if __name__ == "__main__": train, args = cloudpickle.load(f"\{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"\{tempdir}/train_output.pkl") ``` 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] for more context. (was: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context.) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] > for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649526#comment-17649526 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- [~xkrogen] I created a new copy, please let me know if you still can't see it. Thank you for your patience! > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context. (was: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context.) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649516#comment-17649516 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- Sorry, I need update it with a new copy. I will add a new comment on this ticket when the new document should be available. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649511#comment-17649511 ] Rithwik Ediga Lakhamsani edited comment on SPARK-41589 at 12/20/22 12:27 AM: - Oh sorry, let me fix that! Does it work now [~xkrogen]? was (Author: JIRAUSER298573): Oh sorry, let me fix that! > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context. (was: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. ) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649511#comment-17649511 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- Oh sorry, let me fix that! > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41592) Implement functionality for training a PyTorch file on the executors
Rithwik Ediga Lakhamsani created SPARK-41592: Summary: Implement functionality for training a PyTorch file on the executors Key: SPARK-41592 URL: https://issues.apache.org/jira/browse/SPARK-41592 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41593) Implement logging from the executor nodes
Rithwik Ediga Lakhamsani created SPARK-41593: Summary: Implement logging from the executor nodes Key: SPARK-41593 URL: https://issues.apache.org/jira/browse/SPARK-41593 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41591) Implement functionality for training a PyTorch file locally
Rithwik Ediga Lakhamsani created SPARK-41591: Summary: Implement functionality for training a PyTorch file locally Key: SPARK-41591 URL: https://issues.apache.org/jira/browse/SPARK-41591 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649509#comment-17649509 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- I am working on this. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41590) Implement Baseline API Code
Rithwik Ediga Lakhamsani created SPARK-41590: Summary: Implement Baseline API Code Key: SPARK-41590 URL: https://issues.apache.org/jira/browse/SPARK-41590 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani Creating a baseline API so that we can agree on how the users will interact with the code. This was determined in this [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and can be updated as necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41589) PyTorch Distributor
Rithwik Ediga Lakhamsani created SPARK-41589: Summary: PyTorch Distributor Key: SPARK-41589 URL: https://issues.apache.org/jira/browse/SPARK-41589 Project: Spark Issue Type: Umbrella Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org