[ https://issues.apache.org/jira/browse/SPARK-44564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruifeng Zheng updated SPARK-44564: ---------------------------------- Attachment: docstr_prompt_only.py > Refine the documents with LLM > ----------------------------- > > Key: SPARK-44564 > URL: https://issues.apache.org/jira/browse/SPARK-44564 > Project: Spark > Issue Type: Umbrella > Components: Documentation > Affects Versions: 4.0.0 > Reporter: Ruifeng Zheng > Priority: Major > Attachments: docstr_prompt_only.py > > > Let's first focus on the Documents of *PySpark DataFrame APIs*. > *1*, Chose a subset of DF APIs > Since the review bandwidth is limited, we recommend each PR contains at least > 5 APIs; > *2*, For each API, copy-paste the function (including function signature, doc > string) to a LLM Model, and ask it to with a prompts (e.g. the attached > prompt), you can of course use/design your own prompt. > For prompt engineering, you can refer to this [Best > practices|https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api] > > *3*, Note that the LLM is not 100% reliable, the generated doc string may > still contain some mistakes, e.g. > * The example code can not run > * The example results are incorrect > * The example code doesn't reflect the example title > * The description use wrong version, add a 'Raise' selection for non-existent > exception > * The lint can be broken > * ... > we need to fix them before sending a PR. > We can try different prompts, choose the good parts and combine them to the > new doc sting. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org