Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


dongjoon-hyun commented on code in PR #46464:
URL: https://github.com/apache/spark/pull/46464#discussion_r1594250748


##
.github/workflows/build_and_test.yml:
##
@@ -644,6 +644,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
+python3.9 -m pip install 'nbsphinx==0.9.3'

Review Comment:
   > Do we need to backport 
[SPARK-48179](https://issues.apache.org/jira/browse/SPARK-48179) to branch 3.4?
   
   I believe it's too late and would be redundant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


cxzl25 commented on code in PR #46464:
URL: https://github.com/apache/spark/pull/46464#discussion_r1594158117


##
.github/workflows/build_and_test.yml:
##
@@ -644,6 +644,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
+python3.9 -m pip install 'nbsphinx==0.9.3'

Review Comment:
   Thanks @dongjoon-hyun! I didn’t know why CI failed on the 3.4 branch at 
first, so I tested it in my own way.
   
   Do we need to backport SPARK-48179 to branch 3.4?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


dongjoon-hyun commented on code in PR #46464:
URL: https://github.com/apache/spark/pull/46464#discussion_r1594151959


##
.github/workflows/build_and_test.yml:
##
@@ -644,6 +644,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
+python3.9 -m pip install 'nbsphinx==0.9.3'

Review Comment:
   Oops. I missed this line. My bad. We should proceed this separately because 
this this the following, @cxzl25 .
   - #46448



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


cxzl25 commented on code in PR #46464:
URL: https://github.com/apache/spark/pull/46464#discussion_r1594150601


##
.github/workflows/build_and_test.yml:
##
@@ -644,6 +644,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
+python3.9 -m pip install 'nbsphinx==0.9.3'

Review Comment:
   Do we need to pin the nbsphinx version in the master branch as well? Similar 
to SPARK-39421 . cc @HyukjinKwon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


dongjoon-hyun closed pull request #46464: [SPARK-48037][CORE][3.4] Fix 
SortShuffleWriter lacks shuffle write related metrics resulting in potentially 
inaccurate data
URL: https://github.com/apache/spark/pull/46464


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


dongjoon-hyun commented on PR #46464:
URL: https://github.com/apache/spark/pull/46464#issuecomment-2100725260

   Merged to branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


cxzl25 commented on code in PR #46464:
URL: https://github.com/apache/spark/pull/46464#discussion_r1593987337


##
.github/workflows/build_and_test.yml:
##
@@ -644,6 +644,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
+python3.9 -m pip install 'nbsphinx==0.9.3'

Review Comment:
   
https://github.com/cxzl25/spark/actions/runs/8997219681/job/24725778387#step:24:6423
   ```
   Exception occurred:
 File "/usr/local/lib/python3.9/dist-packages/nbsphinx/__init__.py", line 
1316, in apply
   for section in self.document.findall(docutils.nodes.section):
   AttributeError: 'document' object has no attribute 'findall'
   ```
   
   The failed CI uses nbsphinx 0.9.4 version, which requires docutils >= 0.18.1.
   
   https://github.com/spatialaudio/nbsphinx/releases/tag/0.9.4
   
   Release 
   0.9.4 May 7, 2024
   0.9.3 Aug 27, 2023



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48037][CORE][3.4] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-08 Thread via GitHub


cxzl25 commented on code in PR #46464:
URL: https://github.com/apache/spark/pull/46464#discussion_r1593987337


##
.github/workflows/build_and_test.yml:
##
@@ -644,6 +644,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
+python3.9 -m pip install 'nbsphinx==0.9.3'

Review Comment:
   
https://github.com/cxzl25/spark/actions/runs/8997219681/job/24725778387#step:24:6423
   ```
   Exception occurred:
 File "/usr/local/lib/python3.9/dist-packages/nbsphinx/__init__.py", line 
1316, in apply
   for section in self.document.findall(docutils.nodes.section):
   AttributeError: 'document' object has no attribute 'findall'
   ```
   
   The failed CI uses nbsphinx 0.9.4 version, which requires docutils >= 0.18.1.
   
   https://github.com/spatialaudio/nbsphinx/releases/tag/0.9.4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org