I am using the Python C API to load the Gutenberg corpus from the nltk library and iterate through the sentences. The Python code I am trying to replicate is:
from nltk.corpus import gutenberg for i, fileid in enumerate(gutenberg.fileids()): sentences = gutenberg.sents(fileid) etc where gutenberg.fileids is, of course, iterable. I use the following C API code to import the module and get pointers: int64_t Call_PyModule() { PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod; pName = PyUnicode_FromString("nltk.corpus"); pModule = PyImport_Import(pName); if (pModule == 0x0){ PyErr_Print(); return 1; } pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); pSentMod = PyObject_GetAttrString(pSubMod, "sents"); pFidIter = PyObject_GetIter(pFidMod); int ckseq_ok = PySeqIter_Check(pFidMod); pFidSeqIter = PySeqIter_New(pFidMod); return 0; } pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator lines return zero: pFidIter = PyObject_GetIter(pFidMod); int ckseq_ok = PySeqIter_Check(pFidMod); pFidSeqIter = PySeqIter_New(pFidMod); So the C API thinks gutenberg.fileids is not iterable, but it is. What am I doing wrong? -- https://mail.python.org/mailman/listinfo/python-list