Github user mayya-sharipova commented on the issue:

    https://github.com/apache/bahir/pull/45
  
    @emlaver 
    I am getting the following unexpected behaviour:
    I have a database with 13 docs and 1 deleted doc. When displaying 
`df.count`, I am getting `14` which is incorrect.  When displaying a dataframe, 
I am getting the last record is NULL.
    +--------+---+--------------------+-----------+
    |_deleted|_id|                _rev|airportName|
    +--------+---+--------------------+-----------+
    |    null|DEL|1-67f14f8891a9f32...|      Delhi|
    |    null|JFK|1-ee8206c8e56a114...|   New York|
    |    null|SVO|1-7d18769b68f6099...|     Moscow|
    |    null|FRA|1-f358b62b0499340...|  Frankfurt|
    |    null|HKG|1-b040e40df5d0080...|  Hong Kong|
    |    null|CDG|1-8c51e401185272e...|      Paris|
    |    null|FCO|1-89431c8db8aa8e4...|       Rome|
    |    null|NRT|1-dce312ac1414110...|      Tokyo|
    |    null|LHR|1-303c622ad8380c9...|     London|
    |    null|BOM|2-a3f39a0741938c4...|    Mumbaii|
    |    null|YUL|1-19a9fe9cace23ec...|   Montreal|
    |    null|IKA|1-3dea74452ca86af...|     Tehran|
    |    null|SIN|1-67037272289432e...|  Singapore|
    |    true|SYD|2-1cc4f2c62db144a...|       null|
    +--------+---+--------------------+-----------+
    
    We should NOT load into dataframe any deleted documents. A user may have 
thousands or millions of deleted documents. We should load only undeleted docs, 
and a dataframe should NOT have a column `"_deleted"`.
    
    ________
    Another error:
    Occasionally, when running `CloudantDF.py` example, get an error:
    File "/Cloudant/bahir/sql-cloudant/examples/python/CloudantDF.py", line 45, 
in <module>
        df.filter(df.airportName >= 'Moscow').select("_id",'airportName').show()
      File 
"..spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/dataframe.py", 
line 1020, in __getattr__
    AttributeError: 'DataFrame' object has no attribute 'airportName' 
    
    For this PR, we can disregard this error and investigate further in 
following PRs.
    _____________
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to