[ 
https://issues.apache.org/jira/browse/SPARK-41989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefaan Lippens updated SPARK-41989:
------------------------------------
    Description: 
in {code}python/pyspark/pandas/__init__.py{code}  there is currently a warning 
when {{PYARROW_IGNORE_TIMEZONE}} env var is not set 
(https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):

{code:python}
    import logging

    logging.warning(
        "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
required to "...
{code}

The {{logging.warning()}} call  will silently do a {{logging.basicConfig()}} 
call (at least in python 3.9, which I tried).
(FYI: Something like {{logging.getLogger(...).warning()}} would not do this 
silent call)


This has the following very hard to figure out side-effect:
importing `pyspark.pandas` (directly or indirectly somewhere)  might break your 
logging setup (if PYARROW_IGNORE_TIMEZONE is not set).

Very basic  example (assuming PYARROW_IGNORE_TIMEZONE is not set):

{code:python}
import logging
import pyspark.pandas

logging.basicConfig(level=logging.DEBUG)

logger = logging.getLogger("test")
logger.warning("I warn you")
logger.debug("I debug you")
{code}

Will only produce the warning, not the debug line.
By removing the {{import pyspark.pandas}}, the debug line is produced

  was:
in {code}python/pyspark/pandas/__init__.py{code}  there is currently a warning 
when {{PYARROW_IGNORE_TIMEZONE}} env var is not set 
(https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):

{code:python}
    import logging

    logging.warning(
        "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
required to "...
{code}

The {{logging.warning()}} call  will silently do a {{logging.basicConfig()}} 
call (at least in python 3.9, which I tried).
(FYI: Something like {{logging.getLogger(...).warning()}} would not do this 
silent call)


This has the following very hard to figure out side-effect:
importing `pyspark.pandas` (directly or indirectly somewhere)  might break your 
logging setup (if PYARROW_IGNORE_TIMEZONE is not set).

Very basic  example (assuming PYARROW_IGNORE_TIMEZONE is not set):

{code:python}
import logging
import pyspark.pandas

logging.basicConfig(level=logging.DEBUG)

logger = logging.getLogger("test")
logger.warning("I warn you")
logger.debug("I debug you")
{code}

Will only produce the warning, not the debug line.
By removing the {{import pyspark.pandas}}, the debug like is produced


> PYARROW_IGNORE_TIMEZONE warning can break application logging setup
> -------------------------------------------------------------------
>
>                 Key: SPARK-41989
>                 URL: https://issues.apache.org/jira/browse/SPARK-41989
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.3
>         Environment: python 3.9 env with pyspark installed
>            Reporter: Stefaan Lippens
>            Priority: Major
>
> in {code}python/pyspark/pandas/__init__.py{code}  there is currently a 
> warning when {{PYARROW_IGNORE_TIMEZONE}} env var is not set 
> (https://github.com/apache/spark/blob/187c4a9c66758e973633c5c309b551b1d9094e6e/python/pyspark/pandas/__init__.py#L44-L59):
> {code:python}
>     import logging
>     logging.warning(
>         "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
> required to "...
> {code}
> The {{logging.warning()}} call  will silently do a {{logging.basicConfig()}} 
> call (at least in python 3.9, which I tried).
> (FYI: Something like {{logging.getLogger(...).warning()}} would not do this 
> silent call)
> This has the following very hard to figure out side-effect:
> importing `pyspark.pandas` (directly or indirectly somewhere)  might break 
> your logging setup (if PYARROW_IGNORE_TIMEZONE is not set).
> Very basic  example (assuming PYARROW_IGNORE_TIMEZONE is not set):
> {code:python}
> import logging
> import pyspark.pandas
> logging.basicConfig(level=logging.DEBUG)
> logger = logging.getLogger("test")
> logger.warning("I warn you")
> logger.debug("I debug you")
> {code}
> Will only produce the warning, not the debug line.
> By removing the {{import pyspark.pandas}}, the debug line is produced



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to