This is the same problem: you're writing a query that joins 3 tables
together, and then applying a "LIMIT 20" to that query. If you look
carefully at your 20 rows of psql output, I expect you'll see the same
aggregates_id appear more than once. There are less than 20 distinct
Aggregate objects. When SQLAlchemy receives those rows, it skips the
duplicates.
The SQL looks something like this:
SELECT ... FROM aggregates
LEFT JOIN aggregate_blocks ON ...
LEFT JOIN blocks ON ...
LIMIT 20
You need the LIMIT to apply just to the aggregates table, rather than
the joined tables. This is a bit tricky because you want to filter by
a field in one of those joined tables. You could use an EXISTS
function, something like this:
SELECT * FROM aggregates
WHERE EXISTS (
SELECT 1
FROM aggregate_blocks
INNER JOIN blocks ON aggregate_blocks.block_id = block.id
WHERE aggregate_blocks.aggregate_id = aggregates.id
AND blocks.is_complete = false
)
LIMIT 20
If you run that in psql, I think you should get 20 *different*
aggregates rows back.
If that works, then to turn it into SQLAlchemy ORM syntax, you should
use Query.exists():
https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.exists
Hope that helps,
Simon
On Fri, Aug 14, 2020 at 3:56 PM Prerna Pandit wrote:
>
> Hey Simon,
>
> Thanks so much for replying to my question. I reworked my code to use
> sqlalchemy ORM and took off flask and paginate so I can narrow down the
> issue. My models now extend from declarative_base.
>
> engine =
> create_engine('postgresql://postgres:postgres@localhost:5400/postgres')
> Session = sessionmaker(bind=engine)
>
> Base = declarative_base()
> session = Session()
>
> Models
>
> class Aggregate(Base):
> __tablename__ = 'aggregates'
> id = Column(UUID(as_uuid=True), primary_key=True,
>server_default=text('uuid_generate_v4()'))
> site_id = Column(UUID(as_uuid=True), nullable=True)
> created_at = Column(DateTime, default=sa.func.now())
> created_by = Column(UUID(as_uuid=True), nullable=True)
> updated_at = Column(DateTime, default=sa.func.now(),
> onupdate=sa.func.now())
> updated_by = Column(UUID(as_uuid=True), nullable=True)
> blocks = relationship('AggregateBlock', cascade='all, delete-orphan',
> passive_deletes=True,
> back_populates='aggregate')
>
>
> class Block(Base):
>
> __tablename__ = 'blocks'
> id = Column(UUID(as_uuid=True), primary_key=True,
>server_default=text('uuid_generate_v4()'))
> type = Column(Text, nullable=False)
> heading = Column(Text, nullable=True)
> subheading = Column(Text, nullable=True)
> label = Column(Text, nullable=True)
> is_complete = Column(Boolean, default=False)
> created_at = Column(DateTime, default=sa.func.now())
> created_by = Column(UUID(as_uuid=True), nullable=True)
> updated_at = Column(DateTime, default=sa.func.now(),
> onupdate=sa.func.now())
> updated_by = Column(UUID(as_uuid=True), nullable=True)
> aggregates = relationship('AggregateBlock', cascade='all, delete-orphan',
> passive_deletes=True, back_populates='block')
>
>
> class AggregateBlock(Base):
> __tablename__ = 'aggregate_blocks'
> id = Column(UUID(as_uuid=True), primary_key=True,
>server_default=text('uuid_generate_v4()'))
> block_id = Column(UUID(as_uuid=True),
> ForeignKey('blocks.id', ondelete='CASCADE'),
> nullable=False, index=True)
> aggregate_id = Column(UUID(as_uuid=True),
> ForeignKey('aggregates.id',
> ondelete='RESTRICT'), nullable=False)
> position = Column(Integer, nullable=False)
> block = relationship('Block', back_populates='aggregates')
> aggregate = relationship('Aggregate', back_populates='blocks')
>
>
> Query:
>
> select = session.query(Aggregate).order_by(Aggregate.created_at) \
> .join(AggregateBlock) \
> .join(Block) \
> .filter(Block.is_complete == complete) \
>
> all_results = select.all()
> limit_results = select.limit(20).all()
>
>
> I still get inconsistent results when I apply limit. Like select.all() will
> return 47 rows but with limit it'll return anywhere between 11 to 15. If I
> take the generated SQL query and run it directly in psql, I get the correct
> count.
>
> SELECT aggregates.id AS aggregates_id, aggregates.site_id AS
> aggregates_site_id, aggregates.created_at AS aggregates_created_at,
> aggregates.created_by AS aggregates_created_by, aggregates.updated_at AS
> aggregates_updated_at, aggregates.updated_by AS aggregates_updated_by
> FROM aggregates JOIN aggregate_blocks ON aggregates.id =
> aggregate_blocks.aggregate_id JOIN blocks ON blocks.id =
> aggregate_blocks.block_id
> WHERE blocks.is_complete = false ORDER BY aggregates.created_at
>
> LIMIT 20
>
>
>
>
>
> On Friday, August 14, 2020 at 10:08:23 AM UTC-4, Simon King